I forgot to fill in the name of the test giving the connection errors below, it 
is testFirstServerDown in Zookeeper_simpleSystem (TestClient.cc 
<http://testclient.cc/>).

-Flavio

> On 04 Jul 2016, at 23:53, Flavio Junqueira <[email protected]> wrote:
> 
>> 
>> On 04 Jul 2016, at 22:01, Michael Han <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Both Java and C unit tests coming with 3.5.2-alpha passed for me in 5 runs.
>> Are the failed tests deterministically reproducible?
> 
> They fail consistently for me. When I run xxx, I get this output in the logs, 
> which is weird because it looks like the client is trying 127.0.0.1:22181 
> only once and after that it only tries 127.0.0.1:22182, it sounds wrong to me:
> 
> 016-07-04 15:04:08,523:33750:ZOO_INFO@zookeeper_init_internal@1111: 
> Initiating client connection, host=127.0.0.1:22182,127.0.0.1:22181 
> sessionTimeout=10000 watcher=0x447050 sessionId=0 sessionPasswd=<null> 
> context=0x7fff8e504910 flags=0
> 2016-07-04 15:04:09,523:33750:ZOO_ERROR@handle_socket_error_msg@2350: Socket 
> [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2016-07-04 15:04:09,523:33750:ZOO_ERROR@handle_socket_error_msg@2350: Socket 
> [127.0.0.1:22181] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2016-07-04 15:04:09,523:33750:ZOO_ERROR@handle_socket_error_msg@2350: Socket 
> [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2016-07-04 15:04:09,523:33750:ZOO_ERROR@handle_socket_error_msg@2350: Socket 
> [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2016-07-04 15:04:09,523:33750:ZOO_ERROR@handle_socket_error_msg@2350: Socket 
> [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2016-07-04 15:04:09,523:33750:ZOO_ERROR@handle_socket_error_msg@2350: Socket 
> [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2016-07-04 15:04:09,523:33750:ZOO_ERROR@handle_socket_error_msg@2350: Socket 
> [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2016-07-04 15:04:09,523:33750:ZOO_ERROR@handle_socket_error_msg@2350: Socket 
> [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2016-07-04 15:04:09,523:33750:ZOO_ERROR@handle_socket_error_msg@2350: Socket 
> [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2016-07-04 15:04:09,524:33750:ZOO_ERROR@handle_socket_error_msg@2350: Socket 
> [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> 2016-07-04 15:04:09,524:33750:ZOO_ERROR@handle_socket_error_msg@2350: Socket 
> [127.0.0.1:22182] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> <This line keeps repeating until the test times out>
> 
> Also, if you check ZK-2463, it looks like the multi tests are failing 
> silently. They are timing out, but the framework isn't picking it up. I 
> haven't had a chance to look at these multi tests to determine whether it is 
> timing or what.
> 
>> If not, it seems we
>> have more flaky tests related to threading / timing that needs to be taken
>> care of, and they don't sound blocker for the release to me.
>> 
> 
> From what I can tell, none of these issues are new, so I have no reason to 
> suspect that an issue we resolved for 3.5.2 is introducing these problems. If 
> we are to be strict, then we cannot release it, but I'd say we benefit from 
> it still being alpha and proceed. We are solving a number of issue that it is 
> good to have out. For 3.5.3, I think we really need to spend some time on the 
> C client.
> 
> -Flavio 
> 
>> On Sun, Jul 3, 2016 at 9:48 PM, Rakesh Radhakrishnan <[email protected]>
>> wrote:
>> 
>>>>> I'm suggesting as a blocker for 3.5.3, I think we should proceed with
>>> 3.5.2 as is and give some love to the C client in the next release.
>>> 
>>> Since the current release is alpha I also feel its OK to go ahead with RC1
>>> and address the C client issue in 3.5.3. That way we'll get more folks
>>> trying it out and stabilize 3.5 version eventually. Probably will listen to
>>> others opinion as well.
>>> 
>>> -Rakesh
>>> 
>>> On Mon, Jul 4, 2016 at 12:32 AM, Flavio Junqueira <[email protected]> wrote:
>>> 
>>>> 
>>>>> On 03 Jul 2016, at 17:53, Chris Nauroth <[email protected]>
>>>> wrote:
>>>>> 
>>>>> For my part, I got a successful full test run from RC1 before starting
>>>> the
>>>>> [VOTE].  The problem with the silent failure of multi tests could have
>>>>> snuck past me easily though.  (Flavio, thank you for filing
>>>>> ZOOKEEPER-2463.)  I'm curious to hear test results from others who are
>>>>> trying RC1.
>>>> 
>>>> The test failures seem to be related to test timing, not bugs, but I
>>>> haven't been able to confirm for the last two I mentioned. Granted that
>>>> timing is in some sense a bug, all I'm saying is that it doesn't seem to
>>>> indicate a regression or anything.
>>>> 
>>>>> 
>>>>> It looks like we also need an issue to track updating the copyright
>>>> notice
>>>>> in the docs.  I don't believe this is an ASF compliance problem in the
>>>>> same way that an erroneous NOTICE file would be, so I propose that we
>>>>> address it in 3.5.3.
>>>> 
>>>> Agreed, we need an issue for that.
>>>> 
>>>>> 
>>>>> Flavio, you suggested filing a blocker for the ZooKeeperQuorumServer.cc
>>>>> failure.  Did you want that targeted to 3.5.2 or 3.5.3?
>>>>> 
>>>> 
>>>> I'm suggesting as a blocker for 3.5.3, I think we should proceed with
>>>> 3.5.2 as is and give some love to the C client in the next release.
>>>> 
>>>>> Overall, how are people feeling about the RC1 [VOTE] at this point?  Is
>>>>> anyone considering a -1, or shall we proceed (keeping in mind it's an
>>>>> alpha) with the intent of fixing things in a more rapid 3.5.3 release
>>>>> cycle?
>>>> 
>>>> I'd say we proceed.
>>>> 
>>>> -Flavio
>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 7/3/16, 8:43 AM, "Flavio Junqueira" <[email protected]> wrote:
>>>>> 
>>>>>> The issue with the TestReconfigServer test is that the client port is
>>>>>> still used and we get a bind exception, which prevents the server from
>>>>>> starting. To verify this locally, I simply added some code to retry
>>> and
>>>>>> it works fine with that fix. Going forward we need a better fox.
>>>>>> 
>>>>>> I haven't able to figure out yet the issue with the
>>>>>> Zookeeper_simpleSystem tests.
>>>>>> 
>>>>>> I have also found something strange with the multi tests. I have
>>> created
>>>>>> ZK-2463 for this problem and made it a blocker for 3.5.3.
>>>>>> 
>>>>>> -Flavio
>>>>>> 
>>>>>>> On 03 Jul 2016, at 15:25, Flavio Junqueira <[email protected]> wrote:
>>>>>>> 
>>>>>>> I have spun a new ubuntu VM to check the C failures. I get three
>>>>>>> failures with the new installation:
>>>>>>> 
>>>>>>> Zookeeper_simpleSystem::testFirstServerDown : assertion : elapsed
>>> 10911
>>>>>>> tests/TestClient.cc:411: Assertion: equality assertion failed
>>>>>>> [Expected: -101, Actual  : -4]
>>>>>>> tests/TestClient.cc:322: Assertion: assertion failed [Expression:
>>>>>>> ctx.waitForConnected(zk)]
>>>>>>> Failures !!!
>>>>>>> Run: 43   Failure total: 2   Failures: 2   Errors: 0
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> TestReconfigServer::testRemoveFollower/usr/bin/java
>>>>>>> ZooKeeper JMX enabled by default
>>>>>>> Using config: ./../../build/test/test-cppunit/conf/0.conf
>>>>>>> Starting zookeeper ... FAILED TO START
>>>>>>> zktest-mt: tests/ZooKeeperQuorumServer.cc:61: void
>>>>>>> ZooKeeperQuorumServer::start(): Assertion `system(command.c_str()) ==
>>>> 0'
>>>>>>> failed.
>>>>>>> /bin/bash: line 5: 47059 Aborted                 (core dumped)
>>>>>>> ZKROOT=./../.. CLASSPATH=$CLASSPATH:$CLOVER_HOME/lib/clover.jar
>>>>>>> ${dir}$tst
>>>>>>> 
>>>>>>> -Flavio
>>>>>>> 
>>>>>>> 
>>>>>>>> On 03 Jul 2016, at 15:19, Edward Ribeiro <[email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi Flavio,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Jul 3, 2016 at 5:54 AM, Flavio Junqueira <[email protected]
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> Hey Eddie,
>>>>>>>> 
>>>>>>>> A few comments on your points:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - the copyright notice is still dating "2008-2013". It's worth
>>>>>>>>> updating to
>>>>>>>>> the current year?
>>>>>>>> 
>>>>>>>> Where are you seeing this? The NOTICE file is correct from what I
>>> can
>>>>>>>> see.
>>>>>>>> 
>>>>>>>> ​Ops, sorry. I was referring to the PDFs and HTMLs in the docs/
>>>>>>>> folder. Even after running "ant docs" the footnote has "2008-2013"
>>>>>>>> copyright. Images attached.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> - I consistently ran on an test error equals to the one at
>>>>>>>>> https://builds.apache.org/job/ZooKeeper-trunk/2982/console 
>>>>>>>>> <https://builds.apache.org/job/ZooKeeper-trunk/2982/console>
>>>>>>>>> <https://builds.apache.org/job/ZooKeeper-trunk/2982/console 
>>>>>>>>> <https://builds.apache.org/job/ZooKeeper-trunk/2982/console>>
>>>>>>>>> <https://builds.apache.org/job/ZooKeeper-trunk/2982/console 
>>>>>>>>> <https://builds.apache.org/job/ZooKeeper-trunk/2982/console>
>>>>>>>>> <https://builds.apache.org/job/ZooKeeper-trunk/2982/console 
>>>>>>>>> <https://builds.apache.org/job/ZooKeeper-trunk/2982/console>>>
>>>>>>>> 
>>>>>>>> I think this is ZK-2152, which Chris has moved to 3.5.3, so even
>>>>>>>> though it isn't ideal. it is expected.
>>>>>>>> 
>>>>>>>> ​Got it. :)
>>>>>>>> ​
>>>>>>>> 
>>>>>>>>> - Also this one:
>>>>>>>>> 
>>>>>>>>> 
>>>> https://mail-archives.apache.org/mod_mbox/zookeeper-dev/201601.mbox/%3C 
>>>> <https://mail-archives.apache.org/mod_mbox/zookeeper-dev/201601.mbox/%3C>
>>>>>>>>> 1279938263.1283.1453526737790.JavaMail.jenkins@crius%3E
>>>>>>>>> <
>>>> https://mail-archives.apache.org/mod_mbox/zookeeper-dev/201601.mbox/%3 
>>>> <https://mail-archives.apache.org/mod_mbox/zookeeper-dev/201601.mbox/%3>
>>>>>>>>> C1279938263.1283.1453526737790.JavaMail.jenkins@crius%3E>
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> I don't know if there is a jira for this one. If not, better create
>>>>>>>> one and make it a blocker.
>>>>>>>> 
>>>>>>>> ​Okay, gonna look for and do this.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> - In fact, there were 14 failing tests total (I suspect all of them
>>>>>>>>> related
>>>>>>>>> to the C tests). Any ideas? A couple of flacky tests?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> In general, having a release with so many tests failing is bad. I
>>>>>>>> didn't get these test failures, so it would be great to report them
>>> or
>>>>>>>> make sure that there are jiras for it.
>>>>>>>> 
>>>>>>>> ​Right. I was only skep​tical of my own tests because I ran the unit
>>>>>>>> tests on a relatively old Ubuntu version, even though it was Java
>>> 1.7.
>>>>>>>> So, I am running the tests on a newer Linux soon just to make sure
>>> it
>>>>>>>> was not a false negative.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Test failures are possibly an indication that something is bad with
>>>>>>>> the RC, so I wouldn't have +1 it if I had observed all those. It
>>> might
>>>>>>>> be ok given that this is still labeled alpha.
>>>>>>>> 
>>>>>>>> ​Excuse me. I only +1'ed because I suspect the errors are restricted
>>>>>>>> to the C binding and my Ubuntu version, etc. But I should have
>>>>>>>> researched further before giving +1, nevertheless. Point taken. :)
>>>>>>>> 
>>>>>>>> Edward
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Cheers
>> Michael.

Reply via email to