karthick-rn opened a new issue #1578: Accumulo master hangs after TLS on ZK
URL: https://github.com/apache/accumulo/issues/1578
 
 
   After enabling TLS for ZK, I noticed that starting Accumulo master hangs in 
an intermediate process as shown below and require to kill the process in-order 
for Accumulo master to start. 
   
   `[user1@host1 ~]$ jps -m
   23314 JournalNode
   23011 NameNode
   23539 DFSZKFailoverController
   **84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL**
   22590 QuorumPeerMain 
   89790 Jps -m
   
   [user1@host1 ~]$ **kill -9 84118**
   
   [user1@host1 ~]$ jps -m
   23314 JournalNode
   23011 NameNode
   23539 DFSZKFailoverController
   89892 Jps -m
   89847 **Main master**
   22590 QuorumPeerMain
   [user1@host1 ~]$ `
   
   Below is the jstack output collected during the hang
   
   `2020-04-01 12:16:50
   Full thread dump OpenJDK 64-Bit Server VM (11.0.6+10-LTS mixed mode, 
sharing):
   
   Threads class SMR info:
   _java_thread_list=0x00000000017945d0, length=18, elements={
   0x0000000000715800, 0x0000000000718000, 0x0000000000726800, 
0x0000000000728800,
   0x000000000072b800, 0x0000000000735800, 0x0000000000858000, 
0x0000000000869800,
   0x000000000066a800, 0x000000000180f800, 0x00000000018f4000, 
0x0000000001d4e000,
   0x0000000001d4f800, 0x0000000001fdf800, 0x0000000001ed8800, 
0x0000000003281000,
   0x0000000003282800, 0x00000000019ed000
   }
   
   "Reference Handler" #2 daemon prio=10 os_prio=0 cpu=0.28ms elapsed=179.54s 
tid=0x0000000000715800 nid=0x5f95 waiting on condition  [0x00007fa1630b4000]
      java.lang.Thread.State: RUNNABLE
        at 
java.lang.ref.Reference.waitForReferencePendingList([email protected]/Native 
Method)
        at 
java.lang.ref.Reference.processPendingReferences([email protected]/Reference.java:241)
        at 
java.lang.ref.Reference$ReferenceHandler.run([email protected]/Reference.java:213)
   
      Locked ownable synchronizers:
        - None
   
   "Finalizer" #3 daemon prio=8 os_prio=0 cpu=0.56ms elapsed=179.54s 
tid=0x0000000000718000 nid=0x5f96 in Object.wait()  [0x00007fa162fb3000]
      java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait([email protected]/Native Method)
        - waiting on <0x000000070000a2b0> (a java.lang.ref.ReferenceQueue$Lock)
        at 
java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:155)
        - waiting to re-lock in wait() <0x000000070000a2b0> (a 
java.lang.ref.ReferenceQueue$Lock)
        at 
java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:176)
        at 
java.lang.ref.Finalizer$FinalizerThread.run([email protected]/Finalizer.java:170)
   
      Locked ownable synchronizers:
        - None
   
   "Signal Dispatcher" #4 daemon prio=9 os_prio=0 cpu=0.46ms elapsed=179.53s 
tid=0x0000000000726800 nid=0x5f97 runnable  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
   
      Locked ownable synchronizers:
        - None
   
   "C2 CompilerThread0" #5 daemon prio=9 os_prio=0 cpu=1496.64ms 
elapsed=179.53s tid=0x0000000000728800 nid=0x5f98 waiting on condition  
[0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
      No compile task
   
      Locked ownable synchronizers:
        - None
   
   "C1 CompilerThread0" #8 daemon prio=9 os_prio=0 cpu=1079.64ms 
elapsed=179.53s tid=0x000000000072b800 nid=0x5f99 waiting on condition  
[0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
      No compile task
   
      Locked ownable synchronizers:
        - None
   
   "Sweeper thread" #9 daemon prio=9 os_prio=0 cpu=72.66ms elapsed=179.53s 
tid=0x0000000000735800 nid=0x5f9a runnable  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
   
      Locked ownable synchronizers:
        - None
   
   "Service Thread" #10 daemon prio=9 os_prio=0 cpu=0.13ms elapsed=179.50s 
tid=0x0000000000858000 nid=0x5f9b runnable  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
   
      Locked ownable synchronizers:
        - None
   
   "Common-Cleaner" #11 daemon prio=8 os_prio=0 cpu=0.43ms elapsed=179.49s 
tid=0x0000000000869800 nid=0x5f9d in Object.wait()  [0x00007fa15c381000]
      java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait([email protected]/Native Method)
        - waiting on <0x00000007000a7668> (a java.lang.ref.ReferenceQueue$Lock)
        at 
java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:155)
        - waiting to re-lock in wait() <0x00000007000a7668> (a 
java.lang.ref.ReferenceQueue$Lock)
        at 
jdk.internal.ref.CleanerImpl.run([email protected]/CleanerImpl.java:148)
        at java.lang.Thread.run([email protected]/Thread.java:834)
        at 
jdk.internal.misc.InnocuousThread.run([email protected]/InnocuousThread.java:134)
   
      Locked ownable synchronizers:
        - None
   
   "DestroyJavaVM" #15 prio=5 os_prio=0 cpu=824.32ms elapsed=178.73s 
tid=0x000000000066a800 nid=0x5f90 waiting on condition  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
   
      Locked ownable synchronizers:
        - None
   
   "org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" 
#18 daemon prio=5 os_prio=0 cpu=0.16ms elapsed=178.29s tid=0x000000000180f800 
nid=0x5fa5 in Object.wait()  [0x00007fa15b4f3000]
      java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait([email protected]/Native Method)
        - waiting on <0x00000007077c4ab8> (a java.lang.ref.ReferenceQueue$Lock)
        at 
java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:155)
        - waiting to re-lock in wait() <0x00000007077c4ab8> (a 
java.lang.ref.ReferenceQueue$Lock)
        at 
java.lang.ref.ReferenceQueue.remove([email protected]/ReferenceQueue.java:176)
        at 
org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3762)
        at java.lang.Thread.run([email protected]/Thread.java:834)
   
      Locked ownable synchronizers:
        - None
   
   "client DomainSocketWatcher" #19 daemon prio=5 os_prio=0 cpu=1.85ms 
elapsed=177.83s tid=0x00000000018f4000 nid=0x5fa6 runnable  [0x00007fa15a7cc000]
      java.lang.Thread.State: RUNNABLE
        at org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(Native Method)
        at 
org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(DomainSocketWatcher.java:52)
        at 
org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:503)
        at java.lang.Thread.run([email protected]/Thread.java:834)
   
      Locked ownable synchronizers:
        - None
   
   "org.apache.accumulo.master.state.SetGoalState-SendThread(kn-fix-0:2281)" 
#25 daemon prio=5 os_prio=0 cpu=142.24ms elapsed=177.47s tid=0x0000000001d4e000 
nid=0x5fab waiting on condition  [0x00007fa159ec3000]
      java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x000000070eddc9d0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos([email protected]/AbstractQueuedSynchronizer.java:2123)
        at 
java.util.concurrent.LinkedBlockingDeque.pollFirst([email protected]/LinkedBlockingDeque.java:513)
        at 
java.util.concurrent.LinkedBlockingDeque.poll([email protected]/LinkedBlockingDeque.java:675)
        at 
org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
   
      Locked ownable synchronizers:
        - None
   
   "org.apache.accumulo.master.state.SetGoalState-EventThread" #26 daemon 
prio=5 os_prio=0 cpu=1.06ms elapsed=177.47s tid=0x0000000001d4f800 nid=0x5fac 
waiting on condition  [0x00007fa159dc2000]
      java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x000000070ee25888> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await([email protected]/AbstractQueuedSynchronizer.java:2081)
        at 
java.util.concurrent.LinkedBlockingQueue.take([email protected]/LinkedBlockingQueue.java:433)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
   
      Locked ownable synchronizers:
        - None
   
   "nioEventLoopGroup-2-1" #27 prio=10 os_prio=0 cpu=776.37ms elapsed=177.24s 
tid=0x0000000001fdf800 nid=0x5fae runnable  [0x00007fa15b917000]
      java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPoll.wait([email protected]/Native Method)
        at 
sun.nio.ch.EPollSelectorImpl.doSelect([email protected]/EPollSelectorImpl.java:120)
        at 
sun.nio.ch.SelectorImpl.lockAndDoSelect([email protected]/SelectorImpl.java:124)
        - locked <0x000000070edab1a0> (a 
io.netty.channel.nio.SelectedSelectionKeySet)
        - locked <0x000000070ed9edb8> (a sun.nio.ch.EPollSelectorImpl)
        at 
sun.nio.ch.SelectorImpl.select([email protected]/SelectorImpl.java:141)
        at 
io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run([email protected]/Thread.java:834)
   
      Locked ownable synchronizers:
        - None
   
   "org.apache.accumulo.master.state.SetGoalState-SendThread(kn-fix-4:2281)" 
#28 daemon prio=5 os_prio=0 cpu=12.24ms elapsed=176.19s tid=0x0000000001ed8800 
nid=0x5fb7 waiting on condition  [0x00007fa159357000]
      java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x000000070efd03b0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:234)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos([email protected]/AbstractQueuedSynchronizer.java:2123)
        at 
java.util.concurrent.LinkedBlockingDeque.pollFirst([email protected]/LinkedBlockingDeque.java:513)
        at 
java.util.concurrent.LinkedBlockingDeque.poll([email protected]/LinkedBlockingDeque.java:675)
        at 
org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
   
      Locked ownable synchronizers:
        - None
   
   "org.apache.accumulo.master.state.SetGoalState-EventThread" #29 daemon 
prio=5 os_prio=0 cpu=0.34ms elapsed=176.19s tid=0x0000000003281000 nid=0x5fb8 
waiting on condition  [0x00007fa159256000]
      java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x000000070efd17e8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await([email protected]/AbstractQueuedSynchronizer.java:2081)
        at 
java.util.concurrent.LinkedBlockingQueue.take([email protected]/LinkedBlockingQueue.java:433)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
   
      Locked ownable synchronizers:
        - None
   
   "nioEventLoopGroup-3-1" #30 prio=10 os_prio=0 cpu=342.16ms elapsed=176.18s 
tid=0x0000000003282800 nid=0x5fb9 runnable  [0x00007fa159155000]
      java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPoll.wait([email protected]/Native Method)
        at 
sun.nio.ch.EPollSelectorImpl.doSelect([email protected]/EPollSelectorImpl.java:120)
        at 
sun.nio.ch.SelectorImpl.lockAndDoSelect([email protected]/SelectorImpl.java:124)
        - locked <0x000000070efcf058> (a 
io.netty.channel.nio.SelectedSelectionKeySet)
        - locked <0x000000070efcee30> (a sun.nio.ch.EPollSelectorImpl)
        at 
sun.nio.ch.SelectorImpl.select([email protected]/SelectorImpl.java:141)
        at 
io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run([email protected]/Thread.java:834)
   
      Locked ownable synchronizers:
        - None
   
   "Attach Listener" #31 daemon prio=9 os_prio=0 cpu=0.74ms elapsed=0.14s 
tid=0x00000000019ed000 nid=0x6153 waiting on condition  [0x0000000000000000]
      java.lang.Thread.State: RUNNABLE
   
      Locked ownable synchronizers:
        - None
   
   "VM Thread" os_prio=0 cpu=154.07ms elapsed=179.54s tid=0x0000000000712800 
nid=0x5f94 runnable  
   
   "GC Thread#0" os_prio=0 cpu=28.68ms elapsed=179.56s tid=0x000000000067e800 
nid=0x5f91 runnable  
   
   "CMS Main Thread" os_prio=0 cpu=1010.50ms elapsed=179.55s 
tid=0x00000000006ee800 nid=0x5f93 runnable  
   
   "CMS Thread#0" os_prio=0 cpu=1.06ms elapsed=179.55s tid=0x00000000006ec000 
nid=0x5f92 runnable  
   
   "CMS Thread#1" os_prio=0 cpu=1.07ms elapsed=177.50s tid=0x0000000001d34000 
nid=0x5faa runnable  
   
   "VM Periodic Task Thread" os_prio=0 cpu=103.05ms elapsed=179.49s 
tid=0x0000000000873800 nid=0x5f9c waiting on condition  
   
   JNI global refs: 19, weak refs: 0`
   
   I & @keith-turner discussed this issue & looked at the jstack output, 
noticed there are 2 non-daemon threads (no. 27 & 30). As per the description on 
this 
[link](https://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#setDaemon(boolean))
 - JVM exits when the only thread running are all daemon threads. 
   Based on which we wanted to check with ZK dev team if those 2 threads are 
expected to be non-daemon threads. 
   
   Let me know if anyone else noticed this issue?
   I'm using ZK 3.5.7, Accumulo 2.0, Hadoop 3.2.1. 
   
   Thanks
   Karthick

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to