Rakesh R commented on ZOOKEEPER-2383:

Thanks [~fpj] for the comments. 

bq. In NettyServerCnxn, it looks like we are only updating the code for 4lws. 
Don't we have to also update it here:

I think, I have handled this case and the following condition present in 
 Am I missing any other case apart from the below line of code.
@@ -735,7 +735,7 @@
                         ZooKeeperServer zks = this.zkServer;
-                        if (zks == null) {
+                        if (zks == null || !zks.isRunning()) {
                             throw new IOException("ZK down");
                         if (initialized) {

bq. but it never exists as the client keeps trying to connect. It sounds like 
some thread is hanging and not letting the test framework exit.
Without patch it fails at {{simplezks.waitForStartupInvocation(10)}}. On the 
other side {{SimpleZooKeeperServer#startup}} thread is waiting at 
startupDelayLatch.await() and this is causing an indefinite wait. How about 
adding a timed out like below:
        startupDelayLatch.await(15, TimeUnit.SECONDS);

bq. This sentence doesn't make much sense to me: Since zk server is not started 
createsession method to be invoked
I will modify this to "Failed to establish ZooKeeper client connection"

bq. Please reduce the test case timeout to no longer than 30s.

FYI, apart from the above, I had fixed [Michael's 
 and [Raul's 
 in trunk patch, need to rebase branch-3-4 patch to incorporate these comments.

> Startup race in ZooKeeperServer
> -------------------------------
>                 Key: ZOOKEEPER-2383
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: jmx, server
>    Affects Versions: 3.4.8
>            Reporter: Steve Rowe
>            Assignee: Rakesh R
>            Priority: Blocker
>             Fix For: 3.4.10, 3.5.3, 3.6.0
>         Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> ZOOKEEPER-2383-br-3-4.patch, ZOOKEEPER-2383.patch, ZOOKEEPER-2383.patch, 
> ZOOKEEPER-2383.patch, release-3.4.8-extra-logging.patch, 
> zk-3.4.8-MBeanRegistry.log, zk-3.4.8-NPE.log
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 
> (SOLR-8724) I ran into test failures where attempts to create a node in a 
> newly started standalone ZooKeeperServer were failing because of an assertion 
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then 
> registers itself in JMX, but if a connection comes in before the server's JMX 
> registration happens, registration of the connection will fail because it 
> trips the assertion that (effectively) its parent (the server) has already 
> registered itself.
> {code:java|title=ZooKeeperServer.java}
>     public synchronized void startup() {
>         if (sessionTracker == null) {
>             createSessionTracker();
>         }
>         startSessionTracker();
>         setupRequestProcessors();
>         registerJMX();
>         state = State.RUNNING;
>         notifyAll();
>     }
> {code}
> {code:java|title=MBeanRegistry.java}
>     public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
>         throws JMException
>     {
>         assert bean != null;
>         String path = null;
>         if (parent != null) {
>             path = mapBean2Path.get(parent);
>             assert path != null;
>         }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this 
> issue with ZK 3.4.6. 

This message was sent by Atlassian JIRA

Reply via email to