[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15499350#comment-15499350
 ] 

Rakesh R edited comment on ZOOKEEPER-2383 at 9/17/16 5:29 PM:
--------------------------------------------------------------

Thanks [~fpj] for the comments. 

bq. In NettyServerCnxn, it looks like we are only updating the code for 4lws. 
Don't we have to also update it here:

I think, I have handled this case and the following condition present in 
[ZOOKEEPER-2383-br-3-4.patch|https://issues.apache.org/jira/secure/attachment/12821043/ZOOKEEPER-2383-br-3-4.patch].
 Am I missing any other case apart from the below line of code.
{code}
@@ -735,7 +735,7 @@
                         bb.flip();
 
                         ZooKeeperServer zks = this.zkServer;
-                        if (zks == null) {
+                        if (zks == null || !zks.isRunning()) {
                             throw new IOException("ZK down");
                         }
                         if (initialized) {
{code}

bq. but it never exists as the client keeps trying to connect. It sounds like 
some thread is hanging and not letting the test framework exit.
Without patch it fails at {{simplezks.waitForStartupInvocation(10)}}. On the 
other side {{SimpleZooKeeperServer#startup}} thread is waiting at 
startupDelayLatch.await() and this is causing an indefinite wait. How about 
adding the countdown logic in finally so that it will proceed:
{code}
                try {
            Assert.assertFalse(
                    "Should fail to create zk client session as server is not 
fully started",
                    simplezks.waitForSessionCreation(10));
        } finally {
            LOG.info(
                    "Decrements the count of the latch, so that server will 
proceed with startup");
            startupDelayLatch.countDown();
        }
{code}

bq. This sentence doesn't make much sense to me: Since zk server is not started 
createsession method to be invoked
I will modify this to "Should fail to create zk client session as server is not 
fully started"

bq. Please reduce the test case timeout to no longer than 30s.
Agreed.

FYI, apart from the above, I had fixed [Michael's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15409673&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15409673]
 and [Raul's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15408826&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15408826]
 in trunk patch, need to rebase branch-3-4 patch to incorporate these comments.


was (Author: rakeshr):
Thanks [~fpj] for the comments. 

bq. In NettyServerCnxn, it looks like we are only updating the code for 4lws. 
Don't we have to also update it here:

I think, I have handled this case and the following condition present in 
[ZOOKEEPER-2383-br-3-4.patch|https://issues.apache.org/jira/secure/attachment/12821043/ZOOKEEPER-2383-br-3-4.patch].
 Am I missing any other case apart from the below line of code.
{code}
@@ -735,7 +735,7 @@
                         bb.flip();
 
                         ZooKeeperServer zks = this.zkServer;
-                        if (zks == null) {
+                        if (zks == null || !zks.isRunning()) {
                             throw new IOException("ZK down");
                         }
                         if (initialized) {
{code}

bq. but it never exists as the client keeps trying to connect. It sounds like 
some thread is hanging and not letting the test framework exit.
Without patch it fails at {{simplezks.waitForStartupInvocation(10)}}. On the 
other side {{SimpleZooKeeperServer#startup}} thread is waiting at 
startupDelayLatch.await() and this is causing an indefinite wait. How about 
adding a timed out like below:
{code}
        startupDelayLatch.await(15, TimeUnit.SECONDS);
{code}

bq. This sentence doesn't make much sense to me: Since zk server is not started 
createsession method to be invoked
I will modify this to "Should fail to create zk client session as server is not 
fully started"

bq. Please reduce the test case timeout to no longer than 30s.
Agreed.

FYI, apart from the above, I had fixed [Michael's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15409673&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15409673]
 and [Raul's 
comment|https://issues.apache.org/jira/browse/ZOOKEEPER-2383?focusedCommentId=15408826&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15408826]
 in trunk patch, need to rebase branch-3-4 patch to incorporate these comments.

> Startup race in ZooKeeperServer
> -------------------------------
>
>                 Key: ZOOKEEPER-2383
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2383
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: jmx, server
>    Affects Versions: 3.4.8
>            Reporter: Steve Rowe
>            Assignee: Rakesh R
>            Priority: Blocker
>             Fix For: 3.4.10, 3.5.3, 3.6.0
>
>         Attachments: TestZkStandaloneJMXRegistrationRaceConcurrent.java, 
> ZOOKEEPER-2383-br-3-4.patch, ZOOKEEPER-2383.patch, ZOOKEEPER-2383.patch, 
> ZOOKEEPER-2383.patch, release-3.4.8-extra-logging.patch, 
> zk-3.4.8-MBeanRegistry.log, zk-3.4.8-NPE.log
>
>
> In attempting to upgrade Solr's ZooKeeper dependency from 3.4.6 to 3.4.8 
> (SOLR-8724) I ran into test failures where attempts to create a node in a 
> newly started standalone ZooKeeperServer were failing because of an assertion 
> in MBeanRegistry.
> ZooKeeperServer.startup() first sets up its request processor chain then 
> registers itself in JMX, but if a connection comes in before the server's JMX 
> registration happens, registration of the connection will fail because it 
> trips the assertion that (effectively) its parent (the server) has already 
> registered itself.
> {code:java|title=ZooKeeperServer.java}
>     public synchronized void startup() {
>         if (sessionTracker == null) {
>             createSessionTracker();
>         }
>         startSessionTracker();
>         setupRequestProcessors();
>         registerJMX();
>         state = State.RUNNING;
>         notifyAll();
>     }
> {code}
> {code:java|title=MBeanRegistry.java}
>     public void register(ZKMBeanInfo bean, ZKMBeanInfo parent)
>         throws JMException
>     {
>         assert bean != null;
>         String path = null;
>         if (parent != null) {
>             path = mapBean2Path.get(parent);
>             assert path != null;
>         }
> {code}
> This problem appears to be new with ZK 3.4.8 - AFAIK Solr never had this 
> issue with ZK 3.4.6. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to