Christopher, I have created an issue ( https://github.com/apache/accumulo/issues/1578) on the github page. Fyi
Thanks, Karthick On Wed, 1 Apr 2020 at 11:58, karthick rn <karthick.narend...@gmail.com> wrote: > Hi Christopher, I have talked through this issue with Keith internally but > haven't raised an official channel for discussion, because I suspected the > issue could be related to ZK / Netty framework as only after enabling TLS > we are seeing this. > May be I should have opened an issue on Accumulo git first. I'm doing this > now. > > Thanks, > Karthick > > On Wed, 1 Apr 2020 at 08:51, Christopher <ctubb...@apache.org> wrote: > >> Karthick, I haven't seen this discussed in the Accumulo community. Can you >> point me to the conversation there? >> >> On Wed, Apr 1, 2020 at 2:19 AM Andor Molnar <an...@apache.org> wrote: >> >> > Why would they need to be daemon threads? >> > I’m not an expert of Java threading, but afaik I/O threads should not be >> > daemon threads in most cases. >> > >> > Also those threads are Netty internal threads, so this question is >> better >> > to be asked in Netty community. >> > ZK threads reported in jstack are just waiting for input to >> send/receive. >> > Do you know at which point Accumulo does stuck? >> > >> > Andor >> > >> > >> > >> > > On 2020. Mar 31., at 14:27, karthick rn <karthick.narend...@gmail.com >> > >> > wrote: >> > > >> > > Hi Enrico, >> > > >> > > Yes, I have already run this through Accumulo folks they have looked >> at >> > the >> > > jstack output & advised to check with ZK devs if those 2 threads (#27 >> & >> > > #30) are expected to be non-daemon threads. >> > > Also, in this cluster we have wire encryption enabled only for ZK and >> by >> > > disabling it we don't encounter this issue. >> > > There are no error messages reported on the ZK server log, below are >> the >> > > INFO messages when running the "accumulo-service master start" command >> > > >> > > 2020-03-31 12:16:28,626 [myid:2] - INFO >> > > [nioEventLoopGroup-7-5:X509AuthenticationProvider@172] - >> Authenticated >> > Id >> > > 'CN=host2' for Scheme 'x509' >> > > >> > > 2020-03-31 12:16:28,676 [myid:2] - INFO >> > > [nioEventLoopGroup-7-5:ZooKeeperServer@1095] - got auth packet >> /<host2 >> > > IP>:46332 >> > > >> > > 2020-03-31 12:16:28,676 [myid:2] - INFO >> > > [nioEventLoopGroup-7-5:ZooKeeperServer@1113] - auth success /<host2 >> > > IP>:46332 >> > > >> > > This issue is reproducible everytime I start Accumulo master. Let me >> know >> > > for any further details? >> > > >> > > Many thanks >> > > >> > > Regards, >> > > Karthick >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > On Tue, 31 Mar 2020 at 07:23, Enrico Olivelli <eolive...@gmail.com> >> > wrote: >> > > >> > >> Hi, >> > >> Did you check with Accumulo community? >> > >> Do you see errors or informational messages in ZK server logs? >> > >> >> > >> Enrico >> > >> >> > >> Il Mar 31 Mar 2020, 01:12 karthick rn <karthick.narend...@gmail.com> >> ha >> > >> scritto: >> > >> >> > >>> Hello dev team, >> > >>> >> > >>> We are using Hadoop, Accumulo & Zookeeper in our environment, after >> > >>> enabling TLS for ZK we noticed that starting Accumulo master service >> > >> hangs >> > >>> in an intermediate process as shown below and require to kill the >> > process >> > >>> in-order for Accumulo master to start. >> > >>> >> > >>> [user1@host1 ~]$ jps -m >> > >>> >> > >>> 23314 JournalNode >> > >>> >> > >>> 23011 NameNode >> > >>> >> > >>> 23539 DFSZKFailoverController >> > >>> >> > >>> *84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL* >> > >>> >> > >>> 22590 QuorumPeerMain >> > >>> >> > >>> 89790 Jps -m >> > >>> >> > >>> >> > >>> [user1@host1 ~]$ *kill -9 84118* >> > >>> >> > >>> [user1@host1 ~]$ jps -m >> > >>> >> > >>> 23314 JournalNode >> > >>> >> > >>> 23011 NameNode >> > >>> >> > >>> 23539 DFSZKFailoverController >> > >>> >> > >>> 89892 Jps -m >> > >>> >> > >>> *89847 Main master* >> > >>> >> > >>> 22590 QuorumPeerMain >> > >>> >> > >>> [user1@host1 ~]$ >> > >>> >> > >>> Jstack collected during the hang shows 2 non-daemon threads (#27 & >> #30) >> > >>> while the rest are daemon threads. Would like to check with the dev >> > team >> > >> if >> > >>> "nioEventLoopGroup" threads are expected to be non-daemon? If so, >> any >> > >>> thoughts on what else might be causing the issue? >> > >>> I have copied only a portion of the jstack output, let me know >> in-case >> > >> you >> > >>> need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop >> > >> 3.2.1 & >> > >>> Accumulo 2.0. Let me know if you need any further details? Many >> thanks >> > >>> >> > >>> >> "org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)" >> > >> #25 >> > >>> daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s >> > >> tid=0x0000000003a10800 >> > >>> nid=0x1624e waiting on condition [0x00007f5c7bd67000] >> > >>> java.lang.Thread.State: TIMED_WAITING (parking) >> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) >> > >>> - parking to wait for <0x000000070f0acf38> (a >> > >>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) >> > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6 >> > >>> /LockSupport.java:234) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6 >> > >>> /AbstractQueuedSynchronizer.java:2123) >> > >>> at >> java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6 >> > >>> /LinkedBlockingDeque.java:513) >> > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6 >> > >>> /LinkedBlockingDeque.java:675) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278) >> > >>> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) >> > >>> >> > >>> Locked ownable synchronizers: >> > >>> - None >> > >>> >> > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #26 >> daemon >> > >>> prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800 >> > >>> nid=0x1624f waiting on condition [0x00007f5c7bc66000] >> > >>> java.lang.Thread.State: WAITING (parking) >> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) >> > >>> - parking to wait for <0x000000070f0f8af0> (a >> > >>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) >> > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6 >> > >>> /LockSupport.java:194) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6 >> > >>> /AbstractQueuedSynchronizer.java:2081) >> > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6 >> > >>> /LinkedBlockingQueue.java:433) >> > >>> at >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) >> > >>> >> > >>> Locked ownable synchronizers: >> > >>> - None >> > >>> >> > >>> "*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms >> > >> elapsed=95.28s >> > >>> tid=0x0000000003cd2800 nid=0x16250 runnable [0x00007f5c7d6b9000] >> > >>> java.lang.Thread.State: RUNNABLE >> > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method) >> > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6 >> > >>> /EPollSelectorImpl.java:120) >> > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6 >> > >>> /SelectorImpl.java:124) >> > >>> - locked <0x000000070f079a28> (a >> > >>> io.netty.channel.nio.SelectedSelectionKeySet) >> > >>> - locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl) >> > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6 >> > >> /SelectorImpl.java:141) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68) >> > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803) >> > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) >> > >>> at >> > >>> >> > >> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) >> > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834) >> > >>> >> > >>> Locked ownable synchronizers: >> > >>> - None >> > >>> >> > >>> >> "org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)" >> > >> #28 >> > >>> daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s >> > tid=0x0000000005942800 >> > >>> nid=0x16259 waiting on condition [0x00007f5c7b145000] >> > >>> java.lang.Thread.State: TIMED_WAITING (parking) >> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) >> > >>> - parking to wait for <0x000000070f2acbc0> (a >> > >>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) >> > >>> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6 >> > >>> /LockSupport.java:234) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6 >> > >>> /AbstractQueuedSynchronizer.java:2123) >> > >>> at >> java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6 >> > >>> /LinkedBlockingDeque.java:513) >> > >>> at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6 >> > >>> /LinkedBlockingDeque.java:675) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278) >> > >>> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) >> > >>> >> > >>> Locked ownable synchronizers: >> > >>> - None >> > >>> >> > >>> "org.apache.accumulo.master.state.SetGoalState-EventThread" #29 >> daemon >> > >>> prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800 >> > >>> nid=0x1625a waiting on condition [0x00007f5c7b044000] >> > >>> java.lang.Thread.State: WAITING (parking) >> > >>> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method) >> > >>> - parking to wait for <0x000000070f2adff8> (a >> > >>> >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) >> > >>> at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6 >> > >>> /LockSupport.java:194) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6 >> > >>> /AbstractQueuedSynchronizer.java:2081) >> > >>> at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6 >> > >>> /LinkedBlockingQueue.java:433) >> > >>> at >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) >> > >>> >> > >>> Locked ownable synchronizers: >> > >>> - None >> > >>> >> > >>> "*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms >> > >> elapsed=94.30s >> > >>> tid=0x00000000044af800 nid=0x1625b runnable [0x00007f5c7b646000] >> > >>> java.lang.Thread.State: RUNNABLE >> > >>> at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method) >> > >>> at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6 >> > >>> /EPollSelectorImpl.java:120) >> > >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6 >> > >>> /SelectorImpl.java:124) >> > >>> - locked <0x000000070f2ab868> (a >> > >>> io.netty.channel.nio.SelectedSelectionKeySet) >> > >>> - locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl) >> > >>> at sun.nio.ch.SelectorImpl.select(java.base@11.0.6 >> > >> /SelectorImpl.java:141) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68) >> > >>> at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803) >> > >>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) >> > >>> at >> > >>> >> > >> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >> > >>> at >> > >>> >> > >>> >> > >> >> > >> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) >> > >>> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834) >> > >>> >> > >>> Locked ownable synchronizers: >> > >>> - None >> > >>> >> > >> >> > >> > >> >