[ https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186556#comment-15186556 ]
Rakesh R commented on ZOOKEEPER-2383: ------------------------------------- Thanks [~steve_rowe] for reporting this issue and good analysis. bq. According to git blame, the latest changes around the startup method in ZooKeeperServer are due to ZOOKEEPER-1907, which actually turned out to be quite problematic, so this could be another issue due to that patch, I'm not sure. [~fpj], sure I'm happy to investigate this. To understand the impact of ZOOKEEPER-1907, first I took the code before ZOOKEEPER-1907 commit version {{da3e7e0d4b66ac5a25d40ae2d0102b1b57994b62}}. I've debugged the code and able to re-produce the issue even without ZOOKEEPER-1907 changes. Coming back to the issues reported in this jira, there are two issues. IIUC, both the cases are due to the race between server startup and processing a client connection request. I've tried an attempt to figure it out, please see the below sequence that creating the trouble. # NullPointerException while creating session {code} 2016-03-08 11:29:00,374 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:55555:NIOServerCnxnFactory@213] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:569) {code} +Thread-1: Starting the server+ 1=> Invoked cnxnFactory.startup(server); 2=> Started NIOServerCxn.Factory thread and register OP_ACCEPT to accept connections 3=> sets zookeeper server to the connection factory 4=> loads zookeeper data 5=> Assume server is about to invoke {{zks.startup();}} and {{sessionTracker}} is not yet initialized. +Thread-2: creating client connection+ 1=> sends connection request to the server 2=> NIOServerCnxn reads the request and invokes {{NIOServerCnxn#readConnectRequest()}} 3=> It then calls {{zkServer.processConnectRequest(this, incomingBuffer);}} 4=> While processing the request it needs {{sessionTracker}} reference, but this is not yet initialized and the server is still in the startup phase causing the NPE error. # MBeanRegistry throws assertion error due to parent doesn't exists {code} 2016-03-08 11:29:00,449 [myid:] - WARN [Thread-0:MBeanRegistry@118] - registered bean 'InMemoryDataTree' with parent 'StandaloneServer_port55555' at path '/StandaloneServer_port55555' java.lang.Throwable: at org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:116) {code} +Thread-1: Starting the server+ 1=> Invoked cnxnFactory.startup(server); 2=> Started NIOServerCxn.Factory thread and register OP_ACCEPT to accept connections 3=> sets zookeeper server to the connection factory 4=> loads zookeeper data 5=> Server invoked {{zks.startup();}} 6=> Started session tracker 7=> Finished settingup RequestProcessors 8=> Invoked {{ZooKeeperServer#registerJMX();}} 9=> Now assume ZooKeeperServer has initialized {{jmxServerBean = new ZooKeeperServerBean(this);}} and about to register the bean in the registry {{MBeanRegistry.getInstance().register(jmxServerBean, null);}} +Thread-2: creating client connection+ 1=> sends connection request to the server 2=> NIOServerCnxn reads the request and invokes {{NIOServerCnxn#readConnectRequest()}} 3=> It then calls {{zkServer.processConnectRequest(this, incomingBuffer);}} 4=> Since all the request processors are ready, it successfully creates the session and goes to register the connection bean 5=> Now, it will invoke {{zkServer.finishSessionInit()}}. Here it invokes {{serverCnxnFactory.registerConnection(cnxn);}} and hitting the path error. > Startup race in ZooKeeperServer > ------------------------------- > > Key: ZOOKEEPER-2383 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383 > Project: ZooKeeper > Issue Type: Bug > Components: jmx, server > Affects Versions: 3.4.8 > Reporter: Steve Rowe > Priority: Blocker > Fix For: 3.4.9 > > Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, > release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, > zk-3.4.8-NPE.log > > > In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 > (SOLR-8724) I ran into test failures where attempts to create a node in a > newly started standalone ZooKeeperServer were failing because of an assertion > in MBeanRegistry. > ZooKeeperServer.startup() first sets up its request processor chain then > registers itself in JMX, but if a connection comes in before the server's JMX > registration happens, registration of the connection will fail because it > trips the assertion that (effectively) its parent (the server) has already > registered itself. > {code:java|title=ZooKeeperServer.java} > public synchronized void startup() { > if (sessionTracker == null) { > createSessionTracker(); > } > startSessionTracker(); > setupRequestProcessors(); > registerJMX(); > state = State.RUNNING; > notifyAll(); > } > {code} > {code:java|title=MBeanRegistry.java} > public void register(ZKMBeanInfo bean, ZKMBeanInfo parent) > throws JMException > { > assert bean != null; > String path = null; > if (parent != null) { > path = mapBean2Path.get(parent); > assert path != null; > } > {code} > This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this > issue with ZK 3.4.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332)