[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984208#comment-15984208
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2759:
-------------------------------------------

Github user hanm commented on the issue:

    https://github.com/apache/zookeeper/pull/229
  
    Thanks for work on this! I hope we have more contributors focus on 
improving the test.
    
    The proposed fix masks the real problem - it will always success no matter 
what - even if the connection was succeeded, since the halt will clean up the 
sender worker map of the peer 0.
    
    >>  a race condition between the removal of the relevant sid from 
senderWorkerMap for peer0 and the 3 second delay from 
assertEventuallyNotConnected.
    I think there is a race, but my observation on it is slightly different. 
The race is between when peer 0 received (and finished) establish connection 
with peer 1, and the assert inside assertEventuallyNotConnected.
    
    >> may not destroy them in time for the test assertion.
    Because of the race, once peer 0 has received the connection, its sender 
worker map will has sid 1 entry forever until the death of the peer 0 itself. 
    
    I think instead of checking on peer 0, we can check peer 1. Its sender 
worker map should always be clean. We can't check peer 0 here because the 
current API does not encode the true state of the connection - it only encodes 
what happened (the connection from 1 to 0 was established on 0's side), but it 
does not encode the status of the connection (is the connection still alive or 
not - in this case peer 1 will close the socket later when it found out peer 0 
has no auth. but on peer 0 side the sender work will have sid 1 inside until 
its death, what's done has been done.). The state of peer 1 wrt its sender map 
should also reflect state of peer 0  since there is only a single connection 
between 0 and 1.
    
    Something like this should serve the purpose:
    assertEventuallyNotConnected(peer1, 0);
    
    Maybe add a sleep before it to make sure all the connection attempts were 
made (from 0  to 1 and from 1 to 0 two ways), but it's a best effort. 5 sec 
sounds enough.
    



> Flaky test: 
> org.apache.zookeeper.server.quorum.QuorumCnxManagerTest.testNoAuthLearnerConnectToAuthRequiredServerWithHigherSid
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2759
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2759
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.10
>            Reporter: Abraham Fine
>            Assignee: Abraham Fine
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to