[
https://issues.apache.org/jira/browse/ZOOKEEPER-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15984208#comment-15984208
]
ASF GitHub Bot commented on ZOOKEEPER-2759:
-------------------------------------------
Github user hanm commented on the issue:
https://github.com/apache/zookeeper/pull/229
Thanks for work on this! I hope we have more contributors focus on
improving the test.
The proposed fix masks the real problem - it will always success no matter
what - even if the connection was succeeded, since the halt will clean up the
sender worker map of the peer 0.
>> a race condition between the removal of the relevant sid from
senderWorkerMap for peer0 and the 3 second delay from
assertEventuallyNotConnected.
I think there is a race, but my observation on it is slightly different.
The race is between when peer 0 received (and finished) establish connection
with peer 1, and the assert inside assertEventuallyNotConnected.
>> may not destroy them in time for the test assertion.
Because of the race, once peer 0 has received the connection, its sender
worker map will has sid 1 entry forever until the death of the peer 0 itself.
I think instead of checking on peer 0, we can check peer 1. Its sender
worker map should always be clean. We can't check peer 0 here because the
current API does not encode the true state of the connection - it only encodes
what happened (the connection from 1 to 0 was established on 0's side), but it
does not encode the status of the connection (is the connection still alive or
not - in this case peer 1 will close the socket later when it found out peer 0
has no auth. but on peer 0 side the sender work will have sid 1 inside until
its death, what's done has been done.). The state of peer 1 wrt its sender map
should also reflect state of peer 0 since there is only a single connection
between 0 and 1.
Something like this should serve the purpose:
assertEventuallyNotConnected(peer1, 0);
Maybe add a sleep before it to make sure all the connection attempts were
made (from 0 to 1 and from 1 to 0 two ways), but it's a best effort. 5 sec
sounds enough.
> Flaky test:
> org.apache.zookeeper.server.quorum.QuorumCnxManagerTest.testNoAuthLearnerConnectToAuthRequiredServerWithHigherSid
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2759
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2759
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.10
> Reporter: Abraham Fine
> Assignee: Abraham Fine
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)