[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186556#comment-15186556
 ] 

Rakesh R commented on ZOOKEEPER-2383:
-------------------------------------

Thanks [~steve_rowe] for reporting this issue and good analysis.

bq. According to git blame, the latest changes around the startup method in 
ZooKeeperServer are due to ZOOKEEPER-1907, which actually turned out to be 
quite problematic, so this could be another issue due to that patch, I'm not 
sure.

[~fpj], sure I'm happy to investigate this. To understand the impact of 
ZOOKEEPER-1907, first I took the code before ZOOKEEPER-1907 commit version 
{{da3e7e0d4b66ac5a25d40ae2d0102b1b57994b62}}. I've debugged the code and able 
to re-produce the issue even without ZOOKEEPER-1907 changes.

Coming back to the issues reported in this jira, there are two issues. IIUC, 
both the cases are due to the race between server startup and processing a 
client connection request. I've tried an attempt to figure it out, please see 
the below sequence that creating the trouble.
# NullPointerException while creating session
{code}
2016-03-08 11:29:00,374 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:55555:NIOServerCnxnFactory@213] - 
Ignoring unexpected runtime exception
java.lang.NullPointerException
        at 
org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:569)
{code}
+Thread-1: Starting the server+
1=> Invoked cnxnFactory.startup(server);
2=> Started NIOServerCxn.Factory thread and register OP_ACCEPT to accept 
connections
3=> sets zookeeper server to the connection factory
4=> loads zookeeper data
5=> Assume server is about to invoke {{zks.startup();}} and {{sessionTracker}} 
is not yet initialized.
+Thread-2: creating client connection+
1=> sends connection request to the server
2=> NIOServerCnxn reads the request and invokes 
{{NIOServerCnxn#readConnectRequest()}}
3=> It then calls {{zkServer.processConnectRequest(this, incomingBuffer);}}
4=> While processing the request it needs {{sessionTracker}} reference, but 
this is not yet initialized and the server is still in the startup phase 
causing the NPE error.
# MBeanRegistry throws assertion error due to parent doesn't exists
{code}
2016-03-08 11:29:00,449 [myid:] - WARN  [Thread-0:MBeanRegistry@118] - 
registered bean 'InMemoryDataTree' with parent 'StandaloneServer_port55555' at 
path '/StandaloneServer_port55555'
java.lang.Throwable: 
        at 
org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:116)
{code}
+Thread-1: Starting the server+
1=> Invoked cnxnFactory.startup(server);
2=> Started NIOServerCxn.Factory thread and register OP_ACCEPT to accept 
connections
3=> sets zookeeper server to the connection factory
4=> loads zookeeper data
5=> Server invoked {{zks.startup();}}
6=> Started session tracker
7=> Finished settingup RequestProcessors
8=> Invoked {{ZooKeeperServer#registerJMX();}}
9=> Now assume ZooKeeperServer has initialized {{jmxServerBean = new 
ZooKeeperServerBean(this);}} and about to register the bean in the registry 
{{MBeanRegistry.getInstance().register(jmxServerBean, null);}}
+Thread-2: creating client connection+
1=> sends connection request to the server
2=> NIOServerCnxn reads the request and invokes 
{{NIOServerCnxn#readConnectRequest()}}
3=> It then calls {{zkServer.processConnectRequest(this, incomingBuffer);}}
4=> Since all the request processors are ready, it successfully creates the 
session and goes to register the connection bean
5=> Now, it will invoke {{zkServer.finishSessionInit()}}. Here it invokes 
{{serverCnxnFactory.registerConnection(cnxn);}} and hitting the path error.

> Startup race in ZooKeeperServer
> -------------------------------
>
>                 Key: ZOOKEEPER-2383
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: jmx, server
>    Affects Versions: 3.4.8
>            Reporter: Steve Rowe
>            Priority: Blocker
>             Fix For: 3.4.9
>
>         Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log, 
> zk-3.4.8-NPE.log
>
>
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 
> (SOLR-8724) I ran into test failures where attempts to create a node in a 
> newly started standalone ZooKeeperServer were failing because of an assertion 
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then 
> registers itself in JMX, but if a connection comes in before the server's JMX 
> registration happens, registration of the connection will fail because it 
> trips the assertion that (effectively) its parent (the server) has already 
> registered itself.
> {code:java|title=ZooKeeperServer.java}
>     public synchronized void startup() {
>         if (sessionTracker == null) {
>             createSessionTracker();
>         }
>         startSessionTracker();
>         setupRequestProcessors();
>         registerJMX();
>         state = State.RUNNING;
>         notifyAll();
>     }
> {code}
> {code:java|title=MBeanRegistry.java}
>     public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
>         throws JMException
>     {
>         assert bean != null;
>         String path = null;
>         if (parent != null) {
>             path = mapBean2Path.get(parent);
>             assert path != null;
>         }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this 
> issue with ZK 3.4.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to