karthick-rn opened a new issue #1578: Accumulo master hangs after TLS on ZK URL: https://github.com/apache/accumulo/issues/1578 After enabling TLS for ZK, I noticed that starting Accumulo master hangs in an intermediate process as shown below and require to kill the process in-order for Accumulo master to start. `[user1@host1 ~]$ jps -m 23314 JournalNode 23011 NameNode 23539 DFSZKFailoverController **84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL** 22590 QuorumPeerMain 89790 Jps -m [user1@host1 ~]$ **kill -9 84118** [user1@host1 ~]$ jps -m 23314 JournalNode 23011 NameNode 23539 DFSZKFailoverController 89892 Jps -m 89847 **Main master** 22590 QuorumPeerMain [user1@host1 ~]$ ` Below is the jstack output collected during the hang `2020-04-01 12:16:50 Full thread dump OpenJDK 64-Bit Server VM (11.0.6+10-LTS mixed mode, sharing): Threads class SMR info: _java_thread_list=0x00000000017945d0, length=18, elements={ 0x0000000000715800, 0x0000000000718000, 0x0000000000726800, 0x0000000000728800, 0x000000000072b800, 0x0000000000735800, 0x0000000000858000, 0x0000000000869800, 0x000000000066a800, 0x000000000180f800, 0x00000000018f4000, 0x0000000001d4e000, 0x0000000001d4f800, 0x0000000001fdf800, 0x0000000001ed8800, 0x0000000003281000, 0x0000000003282800, 0x00000000019ed000 } "Reference Handler" #2 daemon prio=10 os_prio=0 cpu=0.28ms elapsed=179.54s tid=0x0000000000715800 nid=0x5f95 waiting on condition [0x00007fa1630b4000] java.lang.Thread.State: RUNNABLE at java.lang.ref.Reference.waitForReferencePendingList([email protected]/Native Method) at java.lang.ref.Reference.processPendingReferences([email protected]/Reference.java:241) at java.lang.ref.Reference$ReferenceHandler.run([email protected]/Reference.java:213) Locked ownable synchronizers: - None "Finalizer" #3 daemon prio=8 os_prio=0 cpu=0.56ms elapsed=179.54s tid=0x0000000000718000 nid=0x5f96 in Object.wait() [0x00007fa162fb3000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait([email protected]/Native Method) - waiting on <0x000000070000a2b0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:155) - waiting to re-lock in wait() <0x000000070000a2b0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:176) at java.lang.ref.Finalizer$FinalizerThread.run([email protected]/Finalizer.java:170) Locked ownable synchronizers: - None "Signal Dispatcher" #4 daemon prio=9 os_prio=0 cpu=0.46ms elapsed=179.53s tid=0x0000000000726800 nid=0x5f97 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "C2 CompilerThread0" #5 daemon prio=9 os_prio=0 cpu=1496.64ms elapsed=179.53s tid=0x0000000000728800 nid=0x5f98 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE No compile task Locked ownable synchronizers: - None "C1 CompilerThread0" #8 daemon prio=9 os_prio=0 cpu=1079.64ms elapsed=179.53s tid=0x000000000072b800 nid=0x5f99 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE No compile task Locked ownable synchronizers: - None "Sweeper thread" #9 daemon prio=9 os_prio=0 cpu=72.66ms elapsed=179.53s tid=0x0000000000735800 nid=0x5f9a runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Service Thread" #10 daemon prio=9 os_prio=0 cpu=0.13ms elapsed=179.50s tid=0x0000000000858000 nid=0x5f9b runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Common-Cleaner" #11 daemon prio=8 os_prio=0 cpu=0.43ms elapsed=179.49s tid=0x0000000000869800 nid=0x5f9d in Object.wait() [0x00007fa15c381000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait([email protected]/Native Method) - waiting on <0x00000007000a7668> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:155) - waiting to re-lock in wait() <0x00000007000a7668> (a java.lang.ref.ReferenceQueue$Lock) at jdk.internal.ref.CleanerImpl.run([email protected]/CleanerImpl.java:148) at java.lang.Thread.run([email protected]/Thread.java:834) at jdk.internal.misc.InnocuousThread.run([email protected]/InnocuousThread.java:134) Locked ownable synchronizers: - None "DestroyJavaVM" #15 prio=5 os_prio=0 cpu=824.32ms elapsed=178.73s tid=0x000000000066a800 nid=0x5f90 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #18 daemon prio=5 os_prio=0 cpu=0.16ms elapsed=178.29s tid=0x000000000180f800 nid=0x5fa5 in Object.wait() [0x00007fa15b4f3000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait([email protected]/Native Method) - waiting on <0x00000007077c4ab8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:155) - waiting to re-lock in wait() <0x00000007077c4ab8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:176) at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3762) at java.lang.Thread.run([email protected]/Thread.java:834) Locked ownable synchronizers: - None "client DomainSocketWatcher" #19 daemon prio=5 os_prio=0 cpu=1.85ms elapsed=177.83s tid=0x00000000018f4000 nid=0x5fa6 runnable [0x00007fa15a7cc000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503) at java.lang.Thread.run([email protected]/Thread.java:834) Locked ownable synchronizers: - None "org.apache.accumulo.master.state.SetGoalState-SendThread(kn-fix-0:2281)" #25 daemon prio=5 os_prio=0 cpu=142.24ms elapsed=177.47s tid=0x0000000001d4e000 nid=0x5fab waiting on condition [0x00007fa159ec3000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x000000070eddc9d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos([email protected]/AbstractQueuedSynchronizer.java:2123) at java.util.concurrent.LinkedBlockingDeque.pollFirst([email protected]/LinkedBlockingDeque.java:513) at java.util.concurrent.LinkedBlockingDeque.poll([email protected]/LinkedBlockingDeque.java:675) at org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) Locked ownable synchronizers: - None "org.apache.accumulo.master.state.SetGoalState-EventThread" #26 daemon prio=5 os_prio=0 cpu=1.06ms elapsed=177.47s tid=0x0000000001d4f800 nid=0x5fac waiting on condition [0x00007fa159dc2000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x000000070ee25888> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await([email protected]/AbstractQueuedSynchronizer.java:2081) at java.util.concurrent.LinkedBlockingQueue.take([email protected]/LinkedBlockingQueue.java:433) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) Locked ownable synchronizers: - None "nioEventLoopGroup-2-1" #27 prio=10 os_prio=0 cpu=776.37ms elapsed=177.24s tid=0x0000000001fdf800 nid=0x5fae runnable [0x00007fa15b917000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPoll.wait([email protected]/Native Method) at sun.nio.ch.EPollSelectorImpl.doSelect([email protected]/EPollSelectorImpl.java:120) at sun.nio.ch.SelectorImpl.lockAndDoSelect([email protected]/SelectorImpl.java:124) - locked <0x000000070edab1a0> (a io.netty.channel.nio.SelectedSelectionKeySet) - locked <0x000000070ed9edb8> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select([email protected]/SelectorImpl.java:141) at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68) at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run([email protected]/Thread.java:834) Locked ownable synchronizers: - None "org.apache.accumulo.master.state.SetGoalState-SendThread(kn-fix-4:2281)" #28 daemon prio=5 os_prio=0 cpu=12.24ms elapsed=176.19s tid=0x0000000001ed8800 nid=0x5fb7 waiting on condition [0x00007fa159357000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x000000070efd03b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos([email protected]/AbstractQueuedSynchronizer.java:2123) at java.util.concurrent.LinkedBlockingDeque.pollFirst([email protected]/LinkedBlockingDeque.java:513) at java.util.concurrent.LinkedBlockingDeque.poll([email protected]/LinkedBlockingDeque.java:675) at org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) Locked ownable synchronizers: - None "org.apache.accumulo.master.state.SetGoalState-EventThread" #29 daemon prio=5 os_prio=0 cpu=0.34ms elapsed=176.19s tid=0x0000000003281000 nid=0x5fb8 waiting on condition [0x00007fa159256000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park([email protected]/Native Method) - parking to wait for <0x000000070efd17e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await([email protected]/AbstractQueuedSynchronizer.java:2081) at java.util.concurrent.LinkedBlockingQueue.take([email protected]/LinkedBlockingQueue.java:433) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) Locked ownable synchronizers: - None "nioEventLoopGroup-3-1" #30 prio=10 os_prio=0 cpu=342.16ms elapsed=176.18s tid=0x0000000003282800 nid=0x5fb9 runnable [0x00007fa159155000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPoll.wait([email protected]/Native Method) at sun.nio.ch.EPollSelectorImpl.doSelect([email protected]/EPollSelectorImpl.java:120) at sun.nio.ch.SelectorImpl.lockAndDoSelect([email protected]/SelectorImpl.java:124) - locked <0x000000070efcf058> (a io.netty.channel.nio.SelectedSelectionKeySet) - locked <0x000000070efcee30> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select([email protected]/SelectorImpl.java:141) at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68) at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run([email protected]/Thread.java:834) Locked ownable synchronizers: - None "Attach Listener" #31 daemon prio=9 os_prio=0 cpu=0.74ms elapsed=0.14s tid=0x00000000019ed000 nid=0x6153 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "VM Thread" os_prio=0 cpu=154.07ms elapsed=179.54s tid=0x0000000000712800 nid=0x5f94 runnable "GC Thread#0" os_prio=0 cpu=28.68ms elapsed=179.56s tid=0x000000000067e800 nid=0x5f91 runnable "CMS Main Thread" os_prio=0 cpu=1010.50ms elapsed=179.55s tid=0x00000000006ee800 nid=0x5f93 runnable "CMS Thread#0" os_prio=0 cpu=1.06ms elapsed=179.55s tid=0x00000000006ec000 nid=0x5f92 runnable "CMS Thread#1" os_prio=0 cpu=1.07ms elapsed=177.50s tid=0x0000000001d34000 nid=0x5faa runnable "VM Periodic Task Thread" os_prio=0 cpu=103.05ms elapsed=179.49s tid=0x0000000000873800 nid=0x5f9c waiting on condition JNI global refs: 19, weak refs: 0` I & @keith-turner discussed this issue & looked at the jstack output, noticed there are 2 non-daemon threads (no. 27 & 30). As per the description on this [link](https://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#setDaemon(boolean)) - JVM exits when the only thread running are all daemon threads. Based on which we wanted to check with ZK dev team if those 2 threads are expected to be non-daemon threads. Let me know if anyone else noticed this issue? I'm using ZK 3.5.7, Accumulo 2.0, Hadoop 3.2.1. Thanks Karthick
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
