[
https://issues.apache.org/jira/browse/ZOOKEEPER-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855267#comment-13855267
]
Rakesh R commented on ZOOKEEPER-1833:
-------------------------------------
Hi Flavio/German,
I closely looked at the NioNettySuiteHammerTest failure logs and took sometime
to realize the cause. Could you please go through the below
analysis/observation.
Following is small part of the log messages which has been taken from
https://builds.apache.org/job/ZooKeeper-3.4-WinVS2008_java/387/ build and I
think this will help us to understand more.
>From the log I could see - after the server restarts, four letter stat command
>has been submitted to know the server status. After issuing the command, the
>call is not coming out (please see timestamp between 08:53:32,004 and
>08:55:00) and then it is failing. When I searched the netty bug list, there is
>an open issue which is talking about hanging case, please see NETTY-412. Also,
>we have an open jira ZOOKEEPER-1179, here the description says similar issue.
>IMHO, it would be good to skip the duplicate closure in
>NettyServerCnxn#exceptioncaught and I had updated a patch in that jira. Please
>review ZOOKEEPER-1179 patch.
{code}
[junit] 2013-12-22 08:53:00,613 [myid:] - INFO [New I/O worker
#51:NettyServerCnxn$StatCommand@469] - Stat command output
[junit] 2013-12-22 08:53:00,613 [myid:] - INFO [main:QuorumBase@191] -
127.0.0.1:11233 is accepting client connections
[junit] 2013-12-22 08:53:00,614 [myid:] - INFO
[main:FourLetterWordMain@43] - connecting to 127.0.0.1 11234
[junit] 2013-12-22 08:53:00,614 [myid:] - INFO [New I/O worker
#56:NettyServerCnxn@632] - Processing stat command from /127.0.0.1:49539
[junit] 2013-12-22 08:53:00,615 [myid:] - INFO [New I/O worker
#56:NettyServerCnxn$StatCommand@469] - Stat command output
[junit] 2013-12-22 08:53:00,615 [myid:] - WARN [New I/O worker
#56:NettyServerCnxnFactory$CnxnChannelHandler@111] - Exception caught [id:
0x59e87fb5, /127.0.0.1:49539 => /127.0.0.1:11234] EXCEPTION:
java.io.IOException: A non-blocking socket operation could not be completed
immediately
[junit] java.io.IOException: A non-blocking socket operation could not be
completed immediately
[junit] at sun.nio.ch.SocketDispatcher.close0(Native Method)
[junit] at
sun.nio.ch.SocketDispatcher.preClose(SocketDispatcher.java:62)
[junit] at
sun.nio.ch.SocketChannelImpl.implCloseSelectableChannel(SocketChannelImpl.java:819)
[junit] at
java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:228)
[junit] at
java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115)
[junit] at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:354)
[junit] at
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:81)
[junit] at
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)
[junit] at org.jboss.netty.channel.Channels.close(Channels.java:812)
[junit] at
org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:197)
[junit] at
org.apache.zookeeper.server.NettyServerCnxn.close(NettyServerCnxn.java:114)
[junit] at
org.apache.zookeeper.server.NettyServerCnxn.cleanupWriterSocket(NettyServerCnxn.java:242)
[junit] at
org.apache.zookeeper.server.NettyServerCnxn.access$000(NettyServerCnxn.java:57)
[junit] at
org.apache.zookeeper.server.NettyServerCnxn$CommandThread.run(NettyServerCnxn.java:315)
[junit] at
org.apache.zookeeper.server.NettyServerCnxn$CommandThread.start(NettyServerCnxn.java:306)
[junit] at
org.apache.zookeeper.server.NettyServerCnxn.checkFourLetterWord(NettyServerCnxn.java:678)
[junit] at
org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:791)
[junit] at
org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.processMessage(NettyServerCnxnFactory.java:211)
[junit] at
org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.messageReceived(NettyServerCnxnFactory.java:135)
[junit] at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
[junit] at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
[junit] at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
[junit] at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
[junit] at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
[junit] at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
[junit] at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
[junit] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
[junit] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
[junit] at java.lang.Thread.run(Thread.java:722)
[junit] 2013-12-22 08:53:32,001 [myid:] - INFO
[SessionTracker:ZooKeeperServer@347] - Expiring session 0x343198163eb0000,
timeout of 30000ms exceeded
[junit] 2013-12-22 08:53:32,001 [myid:] - INFO [ProcessThread(sid:5
cport:-1)::PrepRequestProcessor@494] - Processed session termination for
sessionid: 0x343198163eb0000
[junit] 2013-12-22 08:53:32,002 [myid:] - INFO
[SyncThread:5:FileTxnLog@199] - Creating new log file: log.200000001
[junit] 2013-12-22 08:53:32,002 [myid:] - WARN
[QuorumPeer[myid=1]0.0.0.0/0.0.0.0:11233:Follower@118] - Got zxid 0x200000001
expected 0x1
[junit] 2013-12-22 08:53:32,002 [myid:] - INFO
[SyncThread:1:FileTxnLog@199] - Creating new log file: log.200000001
[junit] 2013-12-22 08:53:32,003 [myid:] - WARN
[QuorumPeer[myid=3]0.0.0.0/0.0.0.0:11235:Follower@118] - Got zxid 0x200000001
expected 0x1
[junit] 2013-12-22 08:53:32,003 [myid:] - WARN
[QuorumPeer[myid=4]0.0.0.0/0.0.0.0:11236:Follower@118] - Got zxid 0x200000001
expected 0x1
[junit] 2013-12-22 08:53:32,003 [myid:] - INFO
[SyncThread:4:FileTxnLog@199] - Creating new log file: log.200000001
[junit] 2013-12-22 08:53:32,003 [myid:] - INFO
[SyncThread:3:FileTxnLog@199] - Creating new log file: log.200000001
[junit] 2013-12-22 08:53:32,003 [myid:] - WARN
[QuorumPeer[myid=2]0.0.0.0/0.0.0.0:11234:Follower@118] - Got zxid 0x200000001
expected 0x1
[junit] 2013-12-22 08:53:32,004 [myid:] - INFO
[SyncThread:2:FileTxnLog@199] - Creating new log file: log.200000001
[junit] 2013-12-22 08:55:00,605 [myid:] - INFO [main:ClientBase@238] -
server 127.0.0.1:11234 not up java.net.SocketTimeoutException: Read timed out
[junit] 2013-12-22 08:55:00,605 [myid:] - INFO
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@62] - TEST METHOD FAILED testHammer
[junit] java.lang.AssertionError: waiting for server up
[junit] at org.junit.Assert.fail(Assert.java:91)
[junit] at org.junit.Assert.assertTrue(Assert.java:43)
[junit] at
org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:188)
[junit] at
org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:113)
[junit] at
org.apache.zookeeper.test.AsyncHammerTest.restart(AsyncHammerTest.java:61)
[junit] at
org.apache.zookeeper.test.AsyncHammerTest.testHammer(AsyncHammerTest.java:193)
.....
.....
[junit] 2013-12-22 08:55:00,606 [myid:] - INFO [main:ZKTestCase$1@65] -
FAILED testHammer
.....
.....
[junit] 2013-12-22 08:55:00,696 [myid:] - INFO [main:ZKTestCase$1@55] -
FINISHED testHammer
[junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 185.755 sec
[junit] Test org.apache.zookeeper.test.NioNettySuiteHammerTest FAILED
[junit] Running org.apache.zookeeper.test.NioNettySuiteTest
{code}
> fix windows build
> -----------------
>
> Key: ZOOKEEPER-1833
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1833
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.5
> Reporter: Michi Mutsuzaki
> Assignee: Michi Mutsuzaki
> Priority: Blocker
> Fix For: 3.4.6
>
> Attachments: TEST-org.apache.zookeeper.test.QuorumTest.zip,
> ZOOKEEPER-1833-b3.4.patch, ZOOKEEPER-1833.patch, ZOOKEEPER-1833.patch
>
>
> A bunch of 3.4 tests are failing on windows.
> {noformat}
> [junit] 2013-12-06 08:40:59,692 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testEarlyLeaderAbandonment
> [junit] 2013-12-06 08:41:10,472 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testHighestZxidJoinLate
> [junit] 2013-12-06 08:45:31,085 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testUpdatingEpoch
> [junit] 2013-12-06 08:55:34,630 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testObserversHammer
> [junit] 2013-12-06 08:55:59,889 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncExistsFailure_NoNode
> [junit] 2013-12-06 08:56:00,571 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncGetACL
> [junit] 2013-12-06 08:56:02,626 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncGetChildrenEmpty
> [junit] 2013-12-06 08:56:03,491 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncGetChildrenSingle
> [junit] 2013-12-06 08:56:11,276 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncGetChildrenTwo
> [junit] 2013-12-06 08:56:13,878 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncGetChildrenFailure_NoNode
> [junit] 2013-12-06 08:56:16,294 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncGetChildren2Empty
> [junit] 2013-12-06 08:56:18,622 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncGetChildren2Single
> [junit] 2013-12-06 08:56:21,224 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncGetChildren2Two
> [junit] 2013-12-06 08:56:23,738 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncGetChildren2Failure_NoNode
> [junit] 2013-12-06 08:56:26,058 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncGetData
> [junit] 2013-12-06 08:56:28,482 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAsyncGetDataFailure_NoNode
> [junit] 2013-12-06 08:57:35,527 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testStartupFailureCreate
> [junit] 2013-12-06 08:57:38,645 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testStartupFailureSet
> [junit] 2013-12-06 08:57:41,261 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testStartupFailureSnapshot
> [junit] 2013-12-06 08:59:22,222 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testClientWithWatcherObj
> [junit] 2013-12-06 09:00:05,592 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testClientCleanup
> [junit] 2013-12-06 09:01:24,113 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testBindByAddress
> [junit] 2013-12-06 09:02:14,123 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testClientwithoutWatcherObj
> [junit] 2013-12-06 09:05:56,461 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testZeroWeightQuorum
> [junit] 2013-12-06 09:08:18,747 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testResyncByDiffAfterFollowerCrashes
> [junit] 2013-12-06 09:09:42,271 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testFourLetterWords
> [junit] 2013-12-06 09:14:03,770 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testLE
> [junit] 2013-12-06 09:46:30,002 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testHierarchicalQuorum
> [junit] 2013-12-06 09:50:26,912 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testHammerBasic
> [junit] 2013-12-06 09:51:07,604 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testQuotaWithQuorum
> [junit] 2013-12-06 09:52:41,515 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testNull
> [junit] 2013-12-06 09:53:22,648 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testDeleteWithChildren
> [junit] 2013-12-06 09:56:49,061 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testClientwithoutWatcherObj
> [junit] 2013-12-06 09:58:27,705 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testGetView
> [junit] 2013-12-06 09:59:07,856 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testViewContains
> [junit] 2013-12-06 10:01:31,418 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testSessionMoved
> [junit] 2013-12-06 10:04:50,542 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testMultiToFollower
> [junit] 2013-12-06 10:07:55,361 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testBehindLeader
> [junit] 2013-12-06 10:10:57,439 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testLateLogs
> [junit] 2013-12-06 10:12:05,336 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testConnectionEvents
> [junit] 2013-12-06 10:14:02,781 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testRecovery
> [junit] 2013-12-06 10:14:37,220 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testFail
> [junit] 2013-12-06 10:14:46,925 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testRestoreCommittedLog
> [junit] 2013-12-06 10:15:30,109 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAuth
> [junit] 2013-12-06 10:16:09,256 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAuth
> [junit] 2013-12-06 10:16:44,586 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAuth
> [junit] 2013-12-06 10:17:19,222 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testBadSaslAuthNotifiesWatch
> [junit] 2013-12-06 10:17:54,239 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAuthFail
> [junit] 2013-12-06 10:18:35,623 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAuth
> [junit] 2013-12-06 10:18:47,094 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testAuth
> [junit] 2013-12-06 10:19:06,770 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testCreateAfterCloseShouldFail
> [junit] 2013-12-06 10:20:23,884 [myid:] - INFO [main:ZKTestCase$1@65] -
> FAILED testBasic
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)