[ https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15499350#comment-15499350 ]
Rakesh R commented on ZOOKEEPER-2383: ------------------------------------- Thanks [~fpj] for the comments. bq. In NettyServerCnxn, it looks like we are only updating the code for 4lws. Don't we have to also update it here: I think, I have handled this case and the following condition present in [ZOOKEEPER-2383-br-3-4.patch|https://issues.apache.org/jira/secure/attachment/12821043/ZOOKEEPER-2383-br-3-4.patch]. Am I missing any other case apart from the below line of code. {code} @@ -735,7 +735,7 @@ bb.flip(); ZooKeeperServer zks = this.zkServer; - if (zks == null) { + if (zks == null || !zks.isRunning()) { throw new IOException("ZK down"); } if (initialized) { {code} bq. but it never exists as the client keeps trying to connect. It sounds like some thread is hanging and not letting the test framework exit. Without patch it fails at {{simplezks.waitForStartupInvocation(10)}}. On the other side {{SimpleZooKeeperServer#startup}} thread is waiting at startupDelayLatch.await() and this is causing an indefinite wait. How about adding a timed out like below: {code} startupDelayLatch.await(15, TimeUnit.SECONDS); {code} bq. This sentence doesn't make much sense to me: Since zk server is not started createsession method to be invoked I will modify this to "Failed to establish ZooKeeper client connection" bq. Please reduce the test case timeout to no longer than 30s. Agreed. FYI, apart from the above, I had fixed [Michael's comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15409673&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15409673] and [Raul's comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15408826&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15408826] in trunk patch, need to rebase branch-3-4 patch to incorporate these comments. > Startup race in ZooKeeperServer > ------------------------------- > > Key: ZOOKEEPER-2383 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383 > Project: ZooKeeper > Issue Type: Bug > Components: jmx, server > Affects Versions: 3.4.8 > Reporter: Steve Rowe > Assignee: Rakesh R > Priority: Blocker > Fix For: 3.4.10, 3.5.3, 3.6.0 > > Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, > ZOOKEEPER-2383-br-3-4.patch, ZOOKEEPER-2383.patch, ZOOKEEPER-2383.patch, > ZOOKEEPER-2383.patch, release-3.4.8-extra-logging.patch, > zk-3.4.8-MBeanRegistry.log, zk-3.4.8-NPE.log > > > In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 > (SOLR-8724) I ran into test failures where attempts to create a node in a > newly started standalone ZooKeeperServer were failing because of an assertion > in MBeanRegistry. > ZooKeeperServer.startup() first sets up its request processor chain then > registers itself in JMX, but if a connection comes in before the server's JMX > registration happens, registration of the connection will fail because it > trips the assertion that (effectively) its parent (the server) has already > registered itself. > {code:java|title=ZooKeeperServer.java} > public synchronized void startup() { > if (sessionTracker == null) { > createSessionTracker(); > } > startSessionTracker(); > setupRequestProcessors(); > registerJMX(); > state = State.RUNNING; > notifyAll(); > } > {code} > {code:java|title=MBeanRegistry.java} > public void register(ZKMBeanInfo bean, ZKMBeanInfo parent) > throws JMException > { > assert bean != null; > String path = null; > if (parent != null) { > path = mapBean2Path.get(parent); > assert path != null; > } > {code} > This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this > issue with ZK 3.4.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)