I'm currently trying to wrap up ZOOKEEPER-1292, and I can move to
early abandonment once I'm done here.
-Flavio
On Nov 8, 2011, at 1:20 AM, Camille Fournier wrote:
Sorry you're feeling bad, Patrick! We can take it from here.
I would really like to get some clarification on this test from some
of the LE experts. What does it really mean that this test is failing?
Is this sort of failure that means that sometimes we have server
startup that takes a bit longer because leader gives up the election,
or will server startup completely hang due to this? If it's the
latter, it should be a high priority fix for 3.4, but if it means that
occasionally startup might have to fail and retry once, it might be
worth worry about in 3.4.1.
Thoughts?
C
On Mon, Nov 7, 2011 at 6:23 PM, Patrick Hunt <[email protected]> wrote:
That's fine (direction re 1-4). However my CI branch 3.4 build failed
over the w/e (once out of four runs). This is AFTER "Preparing for
release 3.4.0 - take 2" was applied (so testing includes 1270, 1264,
etc...)
Notice testEarlyLeaderAbandonment is failing. I have attached the log
file to ZOOKEEPER-1270 JIRA:
https://issues.apache.org/jira/secure/attachment/12502838/testEarlyLeaderAbandonment5.txt.gz
java.lang.RuntimeException: Waiting too long
at
org
.apache
.zookeeper
.server
.quorum.QuorumPeerMainTest.waitForAll(QuorumPeerMainTest.java:324)
at
org
.apache
.zookeeper
.server
.quorum
.QuorumPeerMainTest
.testEarlyLeaderAbandonment(QuorumPeerMainTest.java:195)
at org.apache.zookeeper.JUnit4ZKTestRunner
$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
Should I reopen 1270, or a new jira, or... ? LMK.
Note - I'm feeling quite ill so I have limited time to provide f/b &
test for the next day or so.
Patrick
On Sat, Nov 5, 2011 at 12:22 PM, Flavio Junqueira <fpj@yahoo-
inc.com> wrote:
I'm fine with your proposal. -Flavio
On Nov 5, 2011, at 8:15 PM, Camille Fournier wrote:
2 has been flaky for so long, not sure whether it's worth being a
blocker.
The AsyncHammerTests never pass for me locally. Not sure if it's a
problem or not... I am tempted to go with Mahadev on this and get
this
3.4 release out the door. I would be happy to help manage a 3.4.1
release soon thereafter if we find serious issues.
C
On Sat, Nov 5, 2011 at 3:01 PM, Flavio Junqueira <fpj@yahoo-
inc.com>
wrote:
If 2) is flakey, we need to fix it, no?
-Flavio
On Nov 5, 2011, at 6:14 PM, Patrick Hunt wrote:
I ran the 1270-1194 patch continually overnight (trunk) in my
ci env,
after ~25 test runs I saw 4 failures:
1) #402 - QuorumTest.testFollowersStartAfterLeader
2) #407 - org.apache.zookeeper.test.FLETest.testLE
3) #410 - org.apache.zookeeper.test.AsyncHammerTest.testHammer
4) #415 - org.apache.zookeeper.test.AsyncHammerTest.testHammer
1) client could not connect to reestablished quorum: giving up
after
30+ seconds.
2) known flakey test
3) QP failed to shutdown in 30 seconds:
QuorumPeer[myid=3]0.0.0.0/0.0.0.0:11224
4) QP failed to shutdown in 30 seconds:
QuorumPeer[myid=1]0.0.0.0/0.0.0.0:11222
On the plus side no "testearlyleaderabandon" failures.
On the minus side 3/4 are a bit worrysome. Searching back
through all
my previous failures I don't see this happening. Perhaps these
changes
have shifted some timing? My main concern is that this might be
caused
directly by the patch itself....
Patrick
flavio
junqueira
research scientist
[email protected]
direct +34 93-183-8828
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300 fax (408) 349 3301
flavio
junqueira
research scientist
[email protected]
direct +34 93-183-8828
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300 fax (408) 349 3301
flavio
junqueira
research scientist
[email protected]
direct +34 93-183-8828
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300 fax (408) 349 3301