Karthick, I haven't seen this discussed in the Accumulo community. Can you point me to the conversation there?
On Wed, Apr 1, 2020 at 2:19 AM Andor Molnar <an...@apache.org> wrote: > Why would they need to be daemon threads? > I’m not an expert of Java threading, but afaik I/O threads should not be > daemon threads in most cases. > > Also those threads are Netty internal threads, so this question is better > to be asked in Netty community. > ZK threads reported in jstack are just waiting for input to send/receive. > Do you know at which point Accumulo does stuck? > > Andor > > > > > On 2020. Mar 31., at 14:27, karthick rn <karthick.narend...@gmail.com> > wrote: > > > > Hi Enrico, > > > > Yes, I have already run this through Accumulo folks they have looked at > the > > jstack output & advised to check with ZK devs if those 2 threads (#27 & > > #30) are expected to be non-daemon threads. > > Also, in this cluster we have wire encryption enabled only for ZK and by > > disabling it we don't encounter this issue. > > There are no error messages reported on the ZK server log, below are the > > INFO messages when running the "accumulo-service master start" command > > > > 2020-03-31 12:16:28,626 [myid:2] - INFO > > [nioEventLoopGroup-7-5:X509AuthenticationProvider@172] - Authenticated > Id > > 'CN=host2' for Scheme 'x509' > > > > 2020-03-31 12:16:28,676 [myid:2] - INFO > > [nioEventLoopGroup-7-5:ZooKeeperServer@1095] - got auth packet /<host2 > > IP>:46332 > > > > 2020-03-31 12:16:28,676 [myid:2] - INFO > > [nioEventLoopGroup-7-5:ZooKeeperServer@1113] - auth success /<host2 > > IP>:46332 > > > > This issue is reproducible everytime I start Accumulo master. Let me know > > for any further details? > > > > Many thanks > > > > Regards, > > Karthick > > > > > > > > > > > > > > > > On Tue, 31 Mar 2020 at 07:23, Enrico Olivelli <eolive...@gmail.com> > wrote: > > > >> Hi, > >> Did you check with Accumulo community? > >> Do you see errors or informational messages in ZK server logs? > >> > >> Enrico > >> > >> Il Mar 31 Mar 2020, 01:12 karthick rn <karthick.narend...@gmail.com> ha > >> scritto: > >> > >>> Hello dev team, > >>> > >>> We are using Hadoop, Accumulo & Zookeeper in our environment, after > >>> enabling TLS for ZK we noticed that starting Accumulo master service > >> hangs > >>> in an intermediate process as shown below and require to kill the > process > >>> in-order for Accumulo master to start. > >>> > >>> [user1@host1 ~]$ jps -m > >>> > >>> 23314 JournalNode > >>> > >>> 23011 NameNode > >>> > >>> 23539 DFSZKFailoverController > >>> > >>> *84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL* > >>> > >>> 22590 QuorumPeerMain > >>> > >>> 89790 Jps -m > >>> > >>> > >>> [user1@host1 ~]$ *kill -9 84118* > >>> > >>> [user1@host1 ~]$ jps -m > >>> > >>> 23314 JournalNode > >>> > >>> 23011 NameNode > >>> > >>> 23539 DFSZKFailoverController > >>> > >>> 89892 Jps -m > >>> > >>> *89847 Main master* > >>> > >>> 22590 QuorumPeerMain > >>> > >>> [user1@host1 ~]$ > >>> > >>> Jstack collected during the hang shows 2 non-daemon threads (#27 & #30) > >>> while the rest are daemon threads. Would like to check with the dev > team > >> if > >>> "nioEventLoopGroup" threads are expected to be non-daemon? If so, any > >>> thoughts on what else might be causing the issue? > >>> I have copied only a portion of the jstack output, let me know in-case > >> you > >>> need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop > >> 3.2.1 & > >>> Accumulo 2.0. Let me know if you need any further details? Many thanks > >>> > >>> "org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)" > >> #25 > >>> daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s > >> tid=0x0000000003a10800 > >>> nid=0x1624e waiting on condition [0x00007f5c7bd67000] > >>> java.lang.Thread.State: TIMED_WAITING (parking) > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) > >>> - parking to wait for <0x000000070f0acf38> (a > >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6 > >>> /LockSupport.java:234) > >>> at > >>> > >>> > >> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6 > >>> /AbstractQueuedSynchronizer.java:2123) > >>> at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6 > >>> /LinkedBlockingDeque.java:513) > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6 > >>> /LinkedBlockingDeque.java:675) > >>> at > >>> > >>> > >> > org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278) > >>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) > >>> > >>> Locked ownable synchronizers: > >>> - None > >>> > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #26 daemon > >>> prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800 > >>> nid=0x1624f waiting on condition [0x00007f5c7bc66000] > >>> java.lang.Thread.State: WAITING (parking) > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) > >>> - parking to wait for <0x000000070f0f8af0> (a > >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6 > >>> /LockSupport.java:194) > >>> at > >>> > >>> > >> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6 > >>> /AbstractQueuedSynchronizer.java:2081) > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6 > >>> /LinkedBlockingQueue.java:433) > >>> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > >>> > >>> Locked ownable synchronizers: > >>> - None > >>> > >>> "*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms > >> elapsed=95.28s > >>> tid=0x0000000003cd2800 nid=0x16250 runnable [0x00007f5c7d6b9000] > >>> java.lang.Thread.State: RUNNABLE > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method) > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6 > >>> /EPollSelectorImpl.java:120) > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6 > >>> /SelectorImpl.java:124) > >>> - locked <0x000000070f079a28> (a > >>> io.netty.channel.nio.SelectedSelectionKeySet) > >>> - locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl) > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6 > >> /SelectorImpl.java:141) > >>> at > >>> > >>> > >> > io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68) > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803) > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457) > >>> at > >>> > >>> > >> > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > >>> at > >>> > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > >>> at > >>> > >>> > >> > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834) > >>> > >>> Locked ownable synchronizers: > >>> - None > >>> > >>> "org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)" > >> #28 > >>> daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s > tid=0x0000000005942800 > >>> nid=0x16259 waiting on condition [0x00007f5c7b145000] > >>> java.lang.Thread.State: TIMED_WAITING (parking) > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) > >>> - parking to wait for <0x000000070f2acbc0> (a > >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6 > >>> /LockSupport.java:234) > >>> at > >>> > >>> > >> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6 > >>> /AbstractQueuedSynchronizer.java:2123) > >>> at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6 > >>> /LinkedBlockingDeque.java:513) > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6 > >>> /LinkedBlockingDeque.java:675) > >>> at > >>> > >>> > >> > org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278) > >>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) > >>> > >>> Locked ownable synchronizers: > >>> - None > >>> > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #29 daemon > >>> prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800 > >>> nid=0x1625a waiting on condition [0x00007f5c7b044000] > >>> java.lang.Thread.State: WAITING (parking) > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) > >>> - parking to wait for <0x000000070f2adff8> (a > >>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6 > >>> /LockSupport.java:194) > >>> at > >>> > >>> > >> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6 > >>> /AbstractQueuedSynchronizer.java:2081) > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6 > >>> /LinkedBlockingQueue.java:433) > >>> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) > >>> > >>> Locked ownable synchronizers: > >>> - None > >>> > >>> "*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms > >> elapsed=94.30s > >>> tid=0x00000000044af800 nid=0x1625b runnable [0x00007f5c7b646000] > >>> java.lang.Thread.State: RUNNABLE > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method) > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6 > >>> /EPollSelectorImpl.java:120) > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6 > >>> /SelectorImpl.java:124) > >>> - locked <0x000000070f2ab868> (a > >>> io.netty.channel.nio.SelectedSelectionKeySet) > >>> - locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl) > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6 > >> /SelectorImpl.java:141) > >>> at > >>> > >>> > >> > io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68) > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803) > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457) > >>> at > >>> > >>> > >> > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > >>> at > >>> > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > >>> at > >>> > >>> > >> > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834) > >>> > >>> Locked ownable synchronizers: > >>> - None > >>> > >> > >