[ 
https://issues.apache.org/jira/browse/SSHD-1197?focusedWorklogId=622850&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-622850
 ]

ASF GitHub Bot logged work on SSHD-1197:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Jul/21 04:17
            Start Date: 15/Jul/21 04:17
    Worklog Time Spent: 10m 
      Work Description: lgoldstein commented on a change in pull request #201:
URL: https://github.com/apache/mina-sshd/pull/201#discussion_r670119533



##########
File path: 
sshd-core/src/main/java/org/apache/sshd/common/session/helpers/AbstractSession.java
##########
@@ -109,6 +109,16 @@
      */
     public static final String SESSION = "org.apache.sshd.session";
 
+    /**
+     * A last-resort timeout for waiting after having received a KEX_INIT 
message from the peer until we have prepared
+     * our own KEX proposal. This timeout should actually never be hit unless 
there is a serious deadlock somewhere and
+     * the session is never closed.
+     *
+     * @see <a 
href="https://issues.apache.org/jira/browse/SSHD-1197";>SSHD-1197</a>
+     * @see #doKexNegotiation()
+     */
+    private static final Duration KEX_PROPOSAL_SETUP_TIMEOUT = 
Duration.ofSeconds(42);

Review comment:
       >>>  If we make that configurable, users can very easily shoot 
themselves in the foot by setting a (very) low value, like 0.
   
   I understand your concern, but personally I believe in total user 
responsibility - I don't like being told by someone that "they know better", so 
I am reluctant to impose my opinion on others (BTW, this is also why I always 
insist that all our internal variables and/or methods be accessible either via 
getters or as public/protected members).
   
   In this context, the same could be said for *all* our configuration values - 
users can shoot themselves in the foot by choosing a wrong or conflicting 
value. However, I believe this is a good thing - they will learn... IMO, we are 
not supposed to protect our users from mistakes that occur because they do not 
bother to read the documentation or understand the consequences of their 
actions. In this case, perhaps we could do something that would satisfy both 
our concerns:
   ```java
   Duration waitTime = 
CoreModuleProperties.KEX_PROPOSAL_SETUP_TIMEOUT.getRequired(this);
   ValidateUtils.checkTrue(waitTime.toMillis() > SOME_MIN_VALUE, "Value below 
threshold: %d < %d", waitTime, SOME_MIN_VALUE);
   ```
   
   >> Lengthening this should never be necessary. The thread that is preparing 
this side's proposal is already running.
   
   While it seems reasonable, we don't know what weird usages our users might 
encounter or what unforeseen circumstances. Therefore, I believe we should give 
them the flexibility to decide for themselves. 
   
   Bottom line - I trust your judgement, and if you still feel that a constant 
value is more appropriate I will not object to it. In any case, let's also 
update the README (perhaps add a KEX section) and explain this mechanism so 
that if there are future questions or suspicions of bugs around this we will 
have a clear picture of the implementation and the choice of either a constant 
or configurable timeout value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 622850)
    Time Spent: 1h 40m  (was: 1.5h)

> Race condition in KEX
> ---------------------
>
>                 Key: SSHD-1197
>                 URL: https://issues.apache.org/jira/browse/SSHD-1197
>             Project: MINA SSHD
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Thomas Wolf
>            Assignee: Thomas Wolf
>            Priority: Critical
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There is a race condition in the KEX implementation. A simple reproducer can 
> be obtained by modifying {{SftpTransferTest}} by inserting the following in 
> method {{doTestTransferIntegrity()}} just before the main {{try-finally}}:
> {code:java}
> CoreModuleProperties.REKEY_BLOCKS_LIMIT.set(client, Long.valueOf(65536));
> CoreModuleProperties.REKEY_BLOCKS_LIMIT.set(sshd, Long.valueOf(65536));
> try (ClientSession session = createAuthenticatedClientSession();
>     ...
> {code}
> This forces rekeying every 512kB; the test roundtrips a 10Mb file twice, 
> which gives ample opportunity to run into this race condition. Typically the 
> test fails very quickly and hangs.
> The hang is always caused by an async write not being cancelled when the 
> session gets disconnected. The session disconnects during KEX because KEX 
> state is corrupted because of the race condition.
> Most of the time the race condition causes a signature verification failure 
> during KEX, but I also got 
> "Disconnecting(ClientSessionImpl[testTransferIntegrity@/127.0.0.1:62186]): 
> SSH2_DISCONNECT_KEY_EXCHANGE_FAILED - Unable to negotiate key exchange for 
> mac algorithms (server to client) (client: null / server: 
> [email protected],[email protected],[email protected],hmac-sha2-256,hmac-sha2-512,hmac-sha1)",
>  which is easier to understand than the signature verification failure.
> The precise sequence of events in that case is
> {code:java}
>     Server                                                    Client
> 1.  thread-nio-4 requestNewKeyExchange DONE->INIT
> 2.  thread-nio-4 sendKexInit
> 3.                                                            main            
> requestNewKeyExchange DONE->INIT
> 4.                                                            thread-nio-3    
> handleKexInit
> 5.                                                            thread-nio-3    
>   doKexNegotiation INIT->RUN
> 6.                                                            thread-nio-3    
>     negotiate -> Exception: client proposal null
> 7.                                                            main            
> sendKexInit
> 8.  thread-nio-2 receive KEX_INIT      INIT->RUN
> 9.                                                            thread-nio-3    
> Exception caught
> 10.                                                           thread-nio-3    
> Disconnecting
> 11. thread-nio-5 process SSH_MSG_DISCONNECT (KexState RUN)
> {code}
> There is window between steps 3 and 7 in 
> {{AbstractSession.requestNewKeyExchange()}} during which the KEX state is 
> INIT, but the client proposal isn't initialized yet; it's initialized only 
> after {{sendKexInit()}} has been done. However, the client already got the 
> server's KEX_INIT message, and in {{doKexNegotiation()}} proceeds as if the 
> client proposal was already set up.
> So, theres' two problems here:
>  # KEX fails due to a race condition.
>  # The client hangs after disconnecting because an async write future is not 
> terminated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to