[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002654#comment-13002654
 ] 

Vishal K commented on ZOOKEEPER-1006:
-------------------------------------

{quote}
We need to change this test to remove the sleep. Sleeps in tests are very bad, 
1) they tend to fail on slow machines (and more importantly) 2) they make "ant 
test" take a long time. 
We don't want tests to take a long time. I did a bunch of work a year or two 
ago to remove all* sleeps from tests, we shouldn't let them creep back in (I 
realize it's hard to write tests w/o sleep, but it's critical to ensure the 
tests are fast and testers can rely on the results).
{quote}

I agree that we should avoid sleeps, however, on slower machines it is very 
difficult to give a preditable outcome. The test can fail even if we wait 
longer. So can the tester really rely on the result on slower machines?

In general, on a resonably well configured setup, I consider the failure that 
we saw here as a legitimate failure. The test expects a node to join a running 
ensemble in initiLimt() * tickTime() * 2. The test is not failing the ensmble 
(and causing leader election in all 3 nodeS) while restarting the peer. If a 
peer cannot join in two attempts, then I would think that something is wrong in 
FLE. So that was my original intention for the timeout.

{quote}
btw, an easy fix for this test would be to sleep(250) in a loop around the 
thread count check. have some max loop count (equiv to say 60seconds of total 
time) to limit the failure case. In the "success" case the test will complete 
as soon as the machine can process the test.
{quote}

Sure, we can do that. On faster setups, this will let the test to finish 
sooner. Since I wrote the test, I can do that (if you haven't fixed it already).


> QuorumPeer "Address already in use" -- regression in 3.3.3
> ----------------------------------------------------------
>
>                 Key: ZOOKEEPER-1006
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1006
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Blocker
>         Attachments: TEST-org.apache.zookeeper.test.CnxManagerTest.txt, 
> ZOOKEEPER-1006.patch, workerthreads_badtest.txt
>
>
> CnxManagerTest.testWorkerThreads 
> See attachment, this is the first time I've seen this test fail, and it's 
> failed 2 out of the last three test runs.
> Notice (attachment) once this happens the port never becomes available.
> {noformat}
> 2011-03-02 15:53:12,425 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn$Factory@251] - 
> Accepted socket connection from /172.29.6.162:51441
> 2011-03-02 15:53:12,430 - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn@639] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2011-03-02 15:53:12,430 - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn@1435] - Closed 
> socket connection for client /172.29.6.162:51441 (no session established for 
> client)
> 2011-03-02 15:53:12,430 - WARN  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:Follower@82] - Exception when following 
> the leader
> java.io.EOFException
>       at java.io.DataInputStream.readInt(DataInputStream.java:375)
>       at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
>       at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>       at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
>       at 
> org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:267)
>       at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:66)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
> 2011-03-02 15:53:12,431 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:Follower@165] - shutdown called
> java.lang.Exception: shutdown Follower
>       at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
> 2011-03-02 15:53:12,432 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:QuorumPeer@621] - LOOKING
> 2011-03-02 15:53:12,432 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11241:FastLeaderElection@663] - New election. My 
> id =  0, Proposed zxid = 0
> 2011-03-02 15:53:12,433 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,433 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,433 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,633 - INFO  [WorkerReceiver 
> Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 
> (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
> 2011-03-02 15:53:12,633 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11245:QuorumPeer@655] - LEADING
> 2011-03-02 15:53:12,636 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:11245:Leader@54] 
> - TCP NoDelay set to: true
> 2011-03-02 15:53:12,638 - INFO  
> [QuorumPeer:/0:0:0:0:0:0:0:0:11245:ZooKeeperServer@151] - Created server with 
> tickTime 1000 minSessionTimeout 2000 maxSessionTimeout 20000 datadir 
> /var/lib/hudson/workspace/CDH3-ZooKeeper-3.3.3_sles/build/test/tmp/test9001250572426375869.junit.dir/version-2
>  snapdir 
> /var/lib/hudson/workspace/CDH3-ZooKeeper-3.3.3_sles/build/test/tmp/test9001250572426375869.junit.dir/version-2
> 2011-03-02 15:53:12,639 - ERROR 
> [QuorumPeer:/0:0:0:0:0:0:0:0:11245:Leader@133] - Couldn't bind to port 11245
> java.net.BindException: Address already in use
>       at java.net.PlainSocketImpl.socketBind(Native Method)
>       at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:365)
>       at java.net.ServerSocket.bind(ServerSocket.java:319)
>       at java.net.ServerSocket.<init>(ServerSocket.java:185)
>       at java.net.ServerSocket.<init>(ServerSocket.java:97)
>       at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:131)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:512)
>       at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:657)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to