Btw, from the stack traces all of the servers seem to be in a healthy state, complete through leader election and following properly.
>From my phone On Nov 8, 2011 2:01 PM, "Camille Fournier" <[email protected]> wrote: > Anyone know why Patrick's log file might be showing a lot of this > before the error? > > 2011-11-06 01:02:39,905 [myid:2] - INFO > [Thread-76:NIOServerCnxn$StatCommand@655] - Stat command output > > This test never does a stat call, it uses a ZK client to connect in. > This seems strange, perhaps the issue is a test setup one? > > C > > On Mon, Nov 7, 2011 at 6:23 PM, Patrick Hunt <[email protected]> wrote: > > That's fine (direction re 1-4). However my CI branch 3.4 build failed > > over the w/e (once out of four runs). This is AFTER "Preparing for > > release 3.4.0 - take 2" was applied (so testing includes 1270, 1264, > > etc...) > > > > Notice testEarlyLeaderAbandonment is failing. I have attached the log > > file to ZOOKEEPER-1270 JIRA: > > > https://issues.apache.org/jira/secure/attachment/12502838/testEarlyLeaderAbandonment5.txt.gz > > > > java.lang.RuntimeException: Waiting too long > > at > org.apache.zookeeper.server.quorum.QuorumPeerMainTest.waitForAll(QuorumPeerMainTest.java:324) > > at > org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testEarlyLeaderAbandonment(QuorumPeerMainTest.java:195) > > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > > > > Should I reopen 1270, or a new jira, or... ? LMK. > > > > Note - I'm feeling quite ill so I have limited time to provide f/b & > > test for the next day or so. > > > > Patrick > > > > On Sat, Nov 5, 2011 at 12:22 PM, Flavio Junqueira <[email protected]> > wrote: > >> I'm fine with your proposal. -Flavio > >> > >> On Nov 5, 2011, at 8:15 PM, Camille Fournier wrote: > >> > >>> 2 has been flaky for so long, not sure whether it's worth being a > blocker. > >>> The AsyncHammerTests never pass for me locally. Not sure if it's a > >>> problem or not... I am tempted to go with Mahadev on this and get this > >>> 3.4 release out the door. I would be happy to help manage a 3.4.1 > >>> release soon thereafter if we find serious issues. > >>> > >>> C > >>> > >>> On Sat, Nov 5, 2011 at 3:01 PM, Flavio Junqueira <[email protected]> > >>> wrote: > >>>> > >>>> If 2) is flakey, we need to fix it, no? > >>>> > >>>> -Flavio > >>>> > >>>> On Nov 5, 2011, at 6:14 PM, Patrick Hunt wrote: > >>>> > >>>>> I ran the 1270-1194 patch continually overnight (trunk) in my ci env, > >>>>> after ~25 test runs I saw 4 failures: > >>>>> > >>>>> 1) #402 - QuorumTest.testFollowersStartAfterLeader > >>>>> 2) #407 - org.apache.zookeeper.test.FLETest.testLE > >>>>> 3) #410 - org.apache.zookeeper.test.AsyncHammerTest.testHammer > >>>>> 4) #415 - org.apache.zookeeper.test.AsyncHammerTest.testHammer > >>>>> > >>>>> 1) client could not connect to reestablished quorum: giving up after > >>>>> 30+ seconds. > >>>>> 2) known flakey test > >>>>> 3) QP failed to shutdown in 30 seconds: > >>>>> QuorumPeer[myid=3]0.0.0.0/0.0.0.0:11224 > >>>>> 4) QP failed to shutdown in 30 seconds: > >>>>> QuorumPeer[myid=1]0.0.0.0/0.0.0.0:11222 > >>>>> > >>>>> On the plus side no "testearlyleaderabandon" failures. > >>>>> > >>>>> On the minus side 3/4 are a bit worrysome. Searching back through all > >>>>> my previous failures I don't see this happening. Perhaps these > changes > >>>>> have shifted some timing? My main concern is that this might be > caused > >>>>> directly by the patch itself.... > >>>>> > >>>>> Patrick > >>>> > >>>> flavio > >>>> junqueira > >>>> > >>>> research scientist > >>>> > >>>> [email protected] > >>>> direct +34 93-183-8828 > >>>> > >>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es > >>>> phone (408) 349 3300 fax (408) 349 3301 > >>>> > >>>> > >> > >> flavio > >> junqueira > >> > >> research scientist > >> > >> [email protected] > >> direct +34 93-183-8828 > >> > >> avinguda diagonal 177, 8th floor, barcelona, 08018, es > >> phone (408) 349 3300 fax (408) 349 3301 > >> > >> > > >
