Hi Enrico,

Yes, I have already run this through Accumulo folks they have looked at the
jstack output & advised to check with ZK devs if those 2 threads (#27 &
#30) are expected to be non-daemon threads.
Also, in this cluster we have wire encryption enabled only for ZK and by
disabling it we don't encounter this issue.
There are no error messages reported on the ZK server log, below are the
INFO messages when running the "accumulo-service master start" command

2020-03-31 12:16:28,626 [myid:2] - INFO
[nioEventLoopGroup-7-5:X509AuthenticationProvider@172] - Authenticated Id
'CN=host2' for Scheme 'x509'

2020-03-31 12:16:28,676 [myid:2] - INFO
[nioEventLoopGroup-7-5:ZooKeeperServer@1095] - got auth packet /<host2
IP>:46332

2020-03-31 12:16:28,676 [myid:2] - INFO
[nioEventLoopGroup-7-5:ZooKeeperServer@1113] - auth success /<host2
IP>:46332

This issue is reproducible everytime I start Accumulo master. Let me know
for any further details?

Many thanks

Regards,
Karthick







On Tue, 31 Mar 2020 at 07:23, Enrico Olivelli <eolive...@gmail.com> wrote:

> Hi,
> Did you check with Accumulo community?
> Do you see errors or informational messages in ZK server logs?
>
> Enrico
>
> Il Mar 31 Mar 2020, 01:12 karthick rn <karthick.narend...@gmail.com> ha
> scritto:
>
> > Hello dev team,
> >
> > We are using Hadoop, Accumulo & Zookeeper in our environment, after
> > enabling TLS for ZK we noticed that starting Accumulo master service
> hangs
> > in an intermediate process as shown below and require to kill the process
> > in-order for Accumulo master to start.
> >
> > [user1@host1 ~]$ jps -m
> >
> > 23314 JournalNode
> >
> > 23011 NameNode
> >
> > 23539 DFSZKFailoverController
> >
> > *84118 Main org.apache.accumulo.master.state.SetGoalState NORMAL*
> >
> > 22590 QuorumPeerMain
> >
> > 89790 Jps -m
> >
> >
> > [user1@host1 ~]$ *kill -9 84118*
> >
> > [user1@host1 ~]$ jps -m
> >
> > 23314 JournalNode
> >
> > 23011 NameNode
> >
> > 23539 DFSZKFailoverController
> >
> > 89892 Jps -m
> >
> > *89847 Main master*
> >
> > 22590 QuorumPeerMain
> >
> > [user1@host1 ~]$
> >
> > Jstack collected during the hang shows 2 non-daemon threads (#27 & #30)
> > while the rest are daemon threads. Would like to check with the dev team
> if
> > "nioEventLoopGroup" threads are expected to be non-daemon? If so, any
> > thoughts on what else might be causing the issue?
> > I have copied only a portion of the jstack output, let me know in-case
> you
> > need the full output. Fyi, I'm using Apache Zookeeper 3.5.7, Hadoop
> 3.2.1 &
> > Accumulo 2.0. Let me know if you need any further details? Many thanks
> >
> > "org.apache.accumulo.master.state.SetGoalState-SendThread(host1:2281)"
> #25
> > daemon prio=5 os_prio=0 cpu=127.90ms elapsed=95.38s
> tid=0x0000000003a10800
> > nid=0x1624e waiting on condition  [0x00007f5c7bd67000]
> >    java.lang.Thread.State: TIMED_WAITING (parking)
> > at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > - parking to wait for  <0x000000070f0acf38> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> > /LockSupport.java:234)
> > at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> > /AbstractQueuedSynchronizer.java:2123)
> > at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> > /LinkedBlockingDeque.java:513)
> > at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> > /LinkedBlockingDeque.java:675)
> > at
> >
> >
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
> >
> >    Locked ownable synchronizers:
> > - None
> >
> > "org.apache.accumulo.master.state.SetGoalState-EventThread" #26 daemon
> > prio=5 os_prio=0 cpu=0.88ms elapsed=95.38s tid=0x0000000003a16800
> > nid=0x1624f waiting on condition  [0x00007f5c7bc66000]
> >    java.lang.Thread.State: WAITING (parking)
> > at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > - parking to wait for  <0x000000070f0f8af0> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> > /LockSupport.java:194)
> > at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> > /AbstractQueuedSynchronizer.java:2081)
> > at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> > /LinkedBlockingQueue.java:433)
> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> >
> >    Locked ownable synchronizers:
> > - None
> >
> > "*nioEventLoopGroup-2-1*" #27 prio=10 os_prio=0 cpu=718.22ms
> elapsed=95.28s
> > tid=0x0000000003cd2800 nid=0x16250 runnable  [0x00007f5c7d6b9000]
> >    java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> > at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> > /EPollSelectorImpl.java:120)
> > at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> > /SelectorImpl.java:124)
> > - locked <0x000000070f079a28> (a
> > io.netty.channel.nio.SelectedSelectionKeySet)
> > - locked <0x000000070f06ca80> (a sun.nio.ch.EPollSelectorImpl)
> > at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
> /SelectorImpl.java:141)
> > at
> >
> >
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> > at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> > at
> >
> >
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> > at
> > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > at
> >
> >
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> >
> >    Locked ownable synchronizers:
> > - None
> >
> > "org.apache.accumulo.master.state.SetGoalState-SendThread(host2:2281)"
> #28
> > daemon prio=5 os_prio=0 cpu=8.27ms elapsed=94.31s tid=0x0000000005942800
> > nid=0x16259 waiting on condition  [0x00007f5c7b145000]
> >    java.lang.Thread.State: TIMED_WAITING (parking)
> > at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > - parking to wait for  <0x000000070f2acbc0> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> > /LockSupport.java:234)
> > at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.6
> > /AbstractQueuedSynchronizer.java:2123)
> > at java.util.concurrent.LinkedBlockingDeque.pollFirst(java.base@11.0.6
> > /LinkedBlockingDeque.java:513)
> > at java.util.concurrent.LinkedBlockingDeque.poll(java.base@11.0.6
> > /LinkedBlockingDeque.java:675)
> > at
> >
> >
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:278)
> > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223)
> >
> >    Locked ownable synchronizers:
> > - None
> >
> > "org.apache.accumulo.master.state.SetGoalState-EventThread" #29 daemon
> > prio=5 os_prio=0 cpu=0.25ms elapsed=94.31s tid=0x0000000005943800
> > nid=0x1625a waiting on condition  [0x00007f5c7b044000]
> >    java.lang.Thread.State: WAITING (parking)
> > at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> > - parking to wait for  <0x000000070f2adff8> (a
> > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6
> > /LockSupport.java:194)
> > at
> >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6
> > /AbstractQueuedSynchronizer.java:2081)
> > at java.util.concurrent.LinkedBlockingQueue.take(java.base@11.0.6
> > /LinkedBlockingQueue.java:433)
> > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> >
> >    Locked ownable synchronizers:
> > - None
> >
> > "*nioEventLoopGroup-3-1*" #30 prio=10 os_prio=0 cpu=297.70ms
> elapsed=94.30s
> > tid=0x00000000044af800 nid=0x1625b runnable  [0x00007f5c7b646000]
> >    java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.EPoll.wait(java.base@11.0.6/Native Method)
> > at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.6
> > /EPollSelectorImpl.java:120)
> > at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.6
> > /SelectorImpl.java:124)
> > - locked <0x000000070f2ab868> (a
> > io.netty.channel.nio.SelectedSelectionKeySet)
> > - locked <0x000000070f2ab640> (a sun.nio.ch.EPollSelectorImpl)
> > at sun.nio.ch.SelectorImpl.select(java.base@11.0.6
> /SelectorImpl.java:141)
> > at
> >
> >
> io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:68)
> > at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
> > at
> >
> >
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> > at
> > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > at
> >
> >
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> >
> >    Locked ownable synchronizers:
> > - None
> >
>

Reply via email to