[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718062#comment-13718062
 ] 

Flavio Junqueira commented on ZOOKEEPER-1733:
---------------------------------------------

Thanks for looking into it, Jeffrey. Are we still talking about the 3.4 branch 
or this is about trunk too? Here are a couple of comments:

- On top of making sure that a quorum leaves leader election, we should also 
check that the leader ends up thinking that it is the leader. It is a simple 
sanity check and I don't see a reason for removing it if we are not talking 
about the test failures on Windows.
- If you're still focusing on the 3.4, then the best path I can see is to apply 
ZOOKEEPER-1292 to branch 3.4. If it still doesn't work in trunk, as we have 
observed, then we need to work on a patch on top ZOOKEEPER-1292. I'm not 
comfortable removing checks, though, unless it is clear that it is not 
verifying anything.
                
> FLETest#testLE is flaky on windows boxes
> ----------------------------------------
>
>                 Key: ZOOKEEPER-1733
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1733
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.5
>            Reporter: Jeffrey Zhong
>
> FLETest#testLE fail intermittently on windows boxes. The reason is that in 
> LEThread#run() we have:
> {code}
>                                 if(leader == i){
>                                     synchronized(finalObj){
>                                         successCount++;
>                                         if(successCount > (count/2)) 
> finalObj.notify();
>                                     }
>                                     break;
>                                 }
> {code}
> Basically once we have a confirmed leader, the leader thread dies due to the 
> "break" of while loop. 
> While in the verification step, we check if the leader thread alive or not as 
> following:
> {code}
>        if(threads.get((int) leader).isAlive()){
>            Assert.fail("Leader hasn't joined: " + leader);
>        }
> {code}
> On windows boxes, the above verification step fails frequently because leader 
> thread most likely already exits.
> Do we know why we have the leader alive verification step only lead thread 
> can bump up successCount >= count/2?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to