Damien Diederen created ZOOKEEPER-3510:
------------------------------------------
Summary: Frequent 'zkServer.sh stop' failures when running C test
suite
Key: ZOOKEEPER-3510
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3510
Project: ZooKeeper
Issue Type: Bug
Reporter: Damien Diederen
As mentioned in
https://github.com/apache/zookeeper/pull/1054#discussion_r314208678 :
There is a {{sleep 3}} statement in {{zkServer.sh restart}}. I am unable to
unearth the history of that particular line, but I believe part—if not all—of
that {{sleep}} should be part of {{zkServer.sh stop}}.
I frequently observe {{FAILED TO START}} errors in the C test suite; the logs
consistently show that those are caused by {{java.net.BindException: Address
already in use}}. Adding a simple {{sleep 1}} before {{echo STOPPED}} "fixes"
it for me. I will submit an initial PR with the corresponding change and a
commit message akin to:
----
ZOOKEEPER-XXXX: Make zkServer.sh stop more reliable
Kill is asynchronous, and without the sleep, the server's TCP port can still be
busy when the next server is started—causing flaky runs of the C client's test
suite.
(It would probably be better to spin a few times, probing with ps -p.)
----
As noted above, the sleep is far from optimal, an adaptive mechanism would be
better—but I do not want to make the first iteration too complicated.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)