[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855267#comment-13855267
 ] 

Rakesh R commented on ZOOKEEPER-1833:
-------------------------------------

Hi Flavio/German,  

I closely looked at the NioNettySuiteHammerTest failure logs and took sometime 
to realize the cause. Could you please go through the below 
analysis/observation. 

Following is small part of the log messages which has been taken from 
https://builds.apache.org/job/ZooKeeper-3.4-WinVS2008_java/387/ build and I  
think this will help us to understand more.

>From the log I could see - after the server restarts, four letter stat command 
>has been submitted to know the server status. After issuing the command, the 
>call is not coming out (please see timestamp between 08:53:32,004 and 
>08:55:00) and then it is failing. When I searched the netty bug list, there is 
>an open issue which is talking about hanging case, please see NETTY-412. Also, 
>we have an open jira ZOOKEEPER-1179, here the description says similar issue. 
>IMHO, it would be good to skip the duplicate closure in 
>NettyServerCnxn#exceptioncaught and I had updated a patch in that jira. Please 
>review ZOOKEEPER-1179 patch.

{code}
    [junit] 2013-12-22 08:53:00,613 [myid:] - INFO  [New I/O worker 
#51:NettyServerCnxn$StatCommand@469] - Stat command output
    [junit] 2013-12-22 08:53:00,613 [myid:] - INFO  [main:QuorumBase@191] - 
127.0.0.1:11233 is accepting client connections
    [junit] 2013-12-22 08:53:00,614 [myid:] - INFO  
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11234
    [junit] 2013-12-22 08:53:00,614 [myid:] - INFO  [New I/O worker 
#56:NettyServerCnxn@632] - Processing stat command from /127.0.0.1:49539
    [junit] 2013-12-22 08:53:00,615 [myid:] - INFO  [New I/O worker 
#56:NettyServerCnxn$StatCommand@469] - Stat command output
    [junit] 2013-12-22 08:53:00,615 [myid:] - WARN  [New I/O worker 
#56:NettyServerCnxnFactory$CnxnChannelHandler@111] - Exception caught [id: 
0x59e87fb5, /127.0.0.1:49539 => /127.0.0.1:11234] EXCEPTION: 
java.io.IOException: A non-blocking socket operation could not be completed 
immediately
    [junit] java.io.IOException: A non-blocking socket operation could not be 
completed immediately
    [junit]     at sun.nio.ch.SocketDispatcher.close0(Native Method)
    [junit]     at 
sun.nio.ch.SocketDispatcher.preClose(SocketDispatcher.java:62)
    [junit]     at 
sun.nio.ch.SocketChannelImpl.implCloseSelectableChannel(SocketChannelImpl.java:819)
    [junit]     at 
java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:228)
    [junit]     at 
java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115)
    [junit]     at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:354)
    [junit]     at 
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:81)
    [junit]     at 
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)
    [junit]     at org.jboss.netty.channel.Channels.close(Channels.java:812)
    [junit]     at 
org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:197)
    [junit]     at 
org.apache.zookeeper.server.NettyServerCnxn.close(NettyServerCnxn.java:114)
    [junit]     at 
org.apache.zookeeper.server.NettyServerCnxn.cleanupWriterSocket(NettyServerCnxn.java:242)
    [junit]     at 
org.apache.zookeeper.server.NettyServerCnxn.access$000(NettyServerCnxn.java:57)
    [junit]     at 
org.apache.zookeeper.server.NettyServerCnxn$CommandThread.run(NettyServerCnxn.java:315)
    [junit]     at 
org.apache.zookeeper.server.NettyServerCnxn$CommandThread.start(NettyServerCnxn.java:306)
    [junit]     at 
org.apache.zookeeper.server.NettyServerCnxn.checkFourLetterWord(NettyServerCnxn.java:678)
    [junit]     at 
org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:791)
    [junit]     at 
org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.processMessage(NettyServerCnxnFactory.java:211)
    [junit]     at 
org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.messageReceived(NettyServerCnxnFactory.java:135)
    [junit]     at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    [junit]     at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    [junit]     at 
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    [junit]     at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    [junit]     at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    [junit]     at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    [junit]     at 
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    [junit]     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    [junit]     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    [junit]     at java.lang.Thread.run(Thread.java:722)
    [junit] 2013-12-22 08:53:32,001 [myid:] - INFO  
[SessionTracker:ZooKeeperServer@347] - Expiring session 0x343198163eb0000, 
timeout of 30000ms exceeded
    [junit] 2013-12-22 08:53:32,001 [myid:] - INFO  [ProcessThread(sid:5 
cport:-1)::PrepRequestProcessor@494] - Processed session termination for 
sessionid: 0x343198163eb0000
    [junit] 2013-12-22 08:53:32,002 [myid:] - INFO  
[SyncThread:5:FileTxnLog@199] - Creating new log file: log.200000001
    [junit] 2013-12-22 08:53:32,002 [myid:] - WARN  
[QuorumPeer[myid=1]0.0.0.0/0.0.0.0:11233:Follower@118] - Got zxid 0x200000001 
expected 0x1
    [junit] 2013-12-22 08:53:32,002 [myid:] - INFO  
[SyncThread:1:FileTxnLog@199] - Creating new log file: log.200000001
    [junit] 2013-12-22 08:53:32,003 [myid:] - WARN  
[QuorumPeer[myid=3]0.0.0.0/0.0.0.0:11235:Follower@118] - Got zxid 0x200000001 
expected 0x1
    [junit] 2013-12-22 08:53:32,003 [myid:] - WARN  
[QuorumPeer[myid=4]0.0.0.0/0.0.0.0:11236:Follower@118] - Got zxid 0x200000001 
expected 0x1
    [junit] 2013-12-22 08:53:32,003 [myid:] - INFO  
[SyncThread:4:FileTxnLog@199] - Creating new log file: log.200000001
    [junit] 2013-12-22 08:53:32,003 [myid:] - INFO  
[SyncThread:3:FileTxnLog@199] - Creating new log file: log.200000001
    [junit] 2013-12-22 08:53:32,003 [myid:] - WARN  
[QuorumPeer[myid=2]0.0.0.0/0.0.0.0:11234:Follower@118] - Got zxid 0x200000001 
expected 0x1
    [junit] 2013-12-22 08:53:32,004 [myid:] - INFO  
[SyncThread:2:FileTxnLog@199] - Creating new log file: log.200000001
    [junit] 2013-12-22 08:55:00,605 [myid:] - INFO  [main:ClientBase@238] - 
server 127.0.0.1:11234 not up java.net.SocketTimeoutException: Read timed out
    [junit] 2013-12-22 08:55:00,605 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED testHammer
    [junit] java.lang.AssertionError: waiting for server up
    [junit]     at org.junit.Assert.fail(Assert.java:91)
    [junit]     at org.junit.Assert.assertTrue(Assert.java:43)
    [junit]     at 
org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:188)
    [junit]     at 
org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:113)
    [junit]     at 
org.apache.zookeeper.test.AsyncHammerTest.restart(AsyncHammerTest.java:61)
    [junit]     at 
org.apache.zookeeper.test.AsyncHammerTest.testHammer(AsyncHammerTest.java:193)
    .....
    .....
    [junit] 2013-12-22 08:55:00,606 [myid:] - INFO  [main:ZKTestCase$1@65] - 
FAILED testHammer
    .....
    .....
    [junit] 2013-12-22 08:55:00,696 [myid:] - INFO  [main:ZKTestCase$1@55] - 
FINISHED testHammer
    [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 185.755 sec
    [junit] Test org.apache.zookeeper.test.NioNettySuiteHammerTest FAILED
    [junit] Running org.apache.zookeeper.test.NioNettySuiteTest
{code}

> fix windows build
> -----------------
>
>                 Key: ZOOKEEPER-1833
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1833
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.5
>            Reporter: Michi Mutsuzaki
>            Assignee: Michi Mutsuzaki
>            Priority: Blocker
>             Fix For: 3.4.6
>
>         Attachments: TEST-org.apache.zookeeper.test.QuorumTest.zip, 
> ZOOKEEPER-1833-b3.4.patch, ZOOKEEPER-1833.patch, ZOOKEEPER-1833.patch
>
>
> A bunch of 3.4 tests are failing on windows.
> {noformat}
>     [junit] 2013-12-06 08:40:59,692 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testEarlyLeaderAbandonment
>     [junit] 2013-12-06 08:41:10,472 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testHighestZxidJoinLate
>     [junit] 2013-12-06 08:45:31,085 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testUpdatingEpoch
>     [junit] 2013-12-06 08:55:34,630 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testObserversHammer
>     [junit] 2013-12-06 08:55:59,889 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncExistsFailure_NoNode
>     [junit] 2013-12-06 08:56:00,571 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncGetACL
>     [junit] 2013-12-06 08:56:02,626 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncGetChildrenEmpty
>     [junit] 2013-12-06 08:56:03,491 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncGetChildrenSingle
>     [junit] 2013-12-06 08:56:11,276 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncGetChildrenTwo
>     [junit] 2013-12-06 08:56:13,878 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncGetChildrenFailure_NoNode
>     [junit] 2013-12-06 08:56:16,294 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncGetChildren2Empty
>     [junit] 2013-12-06 08:56:18,622 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncGetChildren2Single
>     [junit] 2013-12-06 08:56:21,224 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncGetChildren2Two
>     [junit] 2013-12-06 08:56:23,738 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncGetChildren2Failure_NoNode
>     [junit] 2013-12-06 08:56:26,058 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncGetData
>     [junit] 2013-12-06 08:56:28,482 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAsyncGetDataFailure_NoNode
>     [junit] 2013-12-06 08:57:35,527 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testStartupFailureCreate
>     [junit] 2013-12-06 08:57:38,645 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testStartupFailureSet
>     [junit] 2013-12-06 08:57:41,261 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testStartupFailureSnapshot
>     [junit] 2013-12-06 08:59:22,222 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testClientWithWatcherObj
>     [junit] 2013-12-06 09:00:05,592 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testClientCleanup
>     [junit] 2013-12-06 09:01:24,113 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testBindByAddress
>     [junit] 2013-12-06 09:02:14,123 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testClientwithoutWatcherObj
>     [junit] 2013-12-06 09:05:56,461 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testZeroWeightQuorum
>     [junit] 2013-12-06 09:08:18,747 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testResyncByDiffAfterFollowerCrashes
>     [junit] 2013-12-06 09:09:42,271 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testFourLetterWords
>     [junit] 2013-12-06 09:14:03,770 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testLE
>     [junit] 2013-12-06 09:46:30,002 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testHierarchicalQuorum
>     [junit] 2013-12-06 09:50:26,912 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testHammerBasic
>     [junit] 2013-12-06 09:51:07,604 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testQuotaWithQuorum
>     [junit] 2013-12-06 09:52:41,515 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testNull
>     [junit] 2013-12-06 09:53:22,648 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testDeleteWithChildren
>     [junit] 2013-12-06 09:56:49,061 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testClientwithoutWatcherObj
>     [junit] 2013-12-06 09:58:27,705 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testGetView
>     [junit] 2013-12-06 09:59:07,856 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testViewContains
>     [junit] 2013-12-06 10:01:31,418 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testSessionMoved
>     [junit] 2013-12-06 10:04:50,542 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testMultiToFollower
>     [junit] 2013-12-06 10:07:55,361 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testBehindLeader
>     [junit] 2013-12-06 10:10:57,439 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testLateLogs
>     [junit] 2013-12-06 10:12:05,336 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testConnectionEvents
>     [junit] 2013-12-06 10:14:02,781 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testRecovery
>     [junit] 2013-12-06 10:14:37,220 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testFail
>     [junit] 2013-12-06 10:14:46,925 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testRestoreCommittedLog
>     [junit] 2013-12-06 10:15:30,109 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAuth
>     [junit] 2013-12-06 10:16:09,256 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAuth
>     [junit] 2013-12-06 10:16:44,586 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAuth
>     [junit] 2013-12-06 10:17:19,222 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testBadSaslAuthNotifiesWatch
>     [junit] 2013-12-06 10:17:54,239 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAuthFail
>     [junit] 2013-12-06 10:18:35,623 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAuth
>     [junit] 2013-12-06 10:18:47,094 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testAuth
>     [junit] 2013-12-06 10:19:06,770 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testCreateAfterCloseShouldFail
>     [junit] 2013-12-06 10:20:23,884 [myid:] - INFO  [main:ZKTestCase$1@65] - 
> FAILED testBasic
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to