[
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186556#comment-15186556
]
Rakesh R commented on ZOOKEEPER-2383:
-------------------------------------
Thanks [~steve_rowe] for reporting this issue and good analysis.
bq. According to git blame, the latest changes around the startup method in
ZooKeeperServer are due to ZOOKEEPER-1907, which actually turned out to be
quite problematic, so this could be another issue due to that patch, I'm not
sure.
[~fpj], sure I'm happy to investigate this. To understand the impact of
ZOOKEEPER-1907, first I took the code before ZOOKEEPER-1907 commit version
{{da3e7e0d4b66ac5a25d40ae2d0102b1b57994b62}}. I've debugged the code and able
to re-produce the issue even without ZOOKEEPER-1907 changes.
Coming back to the issues reported in this jira, there are two issues. IIUC,
both the cases are due to the race between server startup and processing a
client connection request. I've tried an attempt to figure it out, please see
the below sequence that creating the trouble.
# NullPointerException while creating session
{code}
2016-03-08 11:29:00,374 [myid:] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:55555:NIOServerCnxnFactory@213] -
Ignoring unexpected runtime exception
java.lang.NullPointerException
at
org.apache.zookeeper.server.ZooKeeperServer.createSession(ZooKeeperServer.java:569)
{code}
+Thread-1: Starting the server+
1=> Invoked cnxnFactory.startup(server);
2=> Started NIOServerCxn.Factory thread and register OP_ACCEPT to accept
connections
3=> sets zookeeper server to the connection factory
4=> loads zookeeper data
5=> Assume server is about to invoke {{zks.startup();}} and {{sessionTracker}}
is not yet initialized.
+Thread-2: creating client connection+
1=> sends connection request to the server
2=> NIOServerCnxn reads the request and invokes
{{NIOServerCnxn#readConnectRequest()}}
3=> It then calls {{zkServer.processConnectRequest(this, incomingBuffer);}}
4=> While processing the request it needs {{sessionTracker}} reference, but
this is not yet initialized and the server is still in the startup phase
causing the NPE error.
# MBeanRegistry throws assertion error due to parent doesn't exists
{code}
2016-03-08 11:29:00,449 [myid:] - WARN [Thread-0:MBeanRegistry@118] -
registered bean 'InMemoryDataTree' with parent 'StandaloneServer_port55555' at
path '/StandaloneServer_port55555'
java.lang.Throwable:
at
org.apache.zookeeper.jmx.MBeanRegistry.register(MBeanRegistry.java:116)
{code}
+Thread-1: Starting the server+
1=> Invoked cnxnFactory.startup(server);
2=> Started NIOServerCxn.Factory thread and register OP_ACCEPT to accept
connections
3=> sets zookeeper server to the connection factory
4=> loads zookeeper data
5=> Server invoked {{zks.startup();}}
6=> Started session tracker
7=> Finished settingup RequestProcessors
8=> Invoked {{ZooKeeperServer#registerJMX();}}
9=> Now assume ZooKeeperServer has initialized {{jmxServerBean = new
ZooKeeperServerBean(this);}} and about to register the bean in the registry
{{MBeanRegistry.getInstance().register(jmxServerBean, null);}}
+Thread-2: creating client connection+
1=> sends connection request to the server
2=> NIOServerCnxn reads the request and invokes
{{NIOServerCnxn#readConnectRequest()}}
3=> It then calls {{zkServer.processConnectRequest(this, incomingBuffer);}}
4=> Since all the request processors are ready, it successfully creates the
session and goes to register the connection bean
5=> Now, it will invoke {{zkServer.finishSessionInit()}}. Here it invokes
{{serverCnxnFactory.registerConnection(cnxn);}} and hitting the path error.
> Startup race in ZooKeeperServer
> -------------------------------
>
> Key: ZOOKEEPER-2383
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
> Project: ZooKeeper
> Issue Type: Bug
> Components: jmx, server
> Affects Versions: 3.4.8
> Reporter: Steve Rowe
> Priority: Blocker
> Fix For: 3.4.9
>
> Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java,
> release-3.4.8-extra-logging.patch, zk-3.4.8-MBeanRegistry.log,
> zk-3.4.8-NPE.log
>
>
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8
> (SOLR-8724) I ran into test failures where attempts to create a node in a
> newly started standalone ZooKeeperServer were failing because of an assertion
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then
> registers itself in JMX, but if a connection comes in before the server's JMX
> registration happens, registration of the connection will fail because it
> trips the assertion that (effectively) its parent (the server) has already
> registered itself.
> {code:java|title=ZooKeeperServer.java}
> public synchronized void startup() {
> if (sessionTracker == null) {
> createSessionTracker();
> }
> startSessionTracker();
> setupRequestProcessors();
> registerJMX();
> state = State.RUNNING;
> notifyAll();
> }
> {code}
> {code:java|title=MBeanRegistry.java}
> public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
> throws JMException
> {
> assert bean != null;
> String path = null;
> if (parent != null) {
> path = mapBean2Path.get(parent);
> assert path != null;
> }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this
> issue with ZK 3.4.6.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)