[
https://issues.apache.org/jira/browse/HDFS-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007576#comment-16007576
]
Doris Gu commented on HDFS-11753:
---------------------------------
I made more tests these days, and the conclusion about multiple jn daemons was:
*1.Branch-2(e.g. 2.8.1,2.7.3,2.6.0) does have the bug*
{code:title=A. First a normal environment.|borderStyle=solid}
hdfs@localhost:~> jps
46453 DFSZKFailoverController
5119 Jps
4311 JournalNode
46859 NameNode
46888 DataNode
{code}
{code:title=B. Start jn once more and it hangs up while nn or dn don't have the
problem.|borderStyle=solid}
hdfs@localhost:~> hdfs journalnode
2017-05-12 10:32:17,291 INFO
org.apache.hadoop.hdfs.qjournal.server.JournalNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting JournalNode
STARTUP_MSG: host = localhost/127.0.0.1
STARTUP_MSG: args = []
......
2017-05-12 10:32:18,571 INFO org.apache.hadoop.http.HttpServer2:
HttpServer.start() threw a non Bind IOException
java.net.BindException: Port in use: 0.0.0.0:8480
at
org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:919)
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeHttpServer.start(JournalNodeHttpServer.java:69)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:163)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:137)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:310)
Caused by: java.net.BindException: address in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at
org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at
org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914)
... 7 more
Exception in thread "main" java.net.BindException: Port in use: 0.0.0.0:8480
at
org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:919)
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeHttpServer.start(JournalNodeHttpServer.java:69)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:163)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:137)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:310)
Caused by: java.net.BindException: address in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at
org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at
org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:914)
... 7 more
{code}
{code:title=C. Get multiple jn daemons.|borderStyle=solid}
hdfs@localhost:~> jps
45930 JournalNode
46453 DFSZKFailoverController
4311 JournalNode
46305 Jps
46859 NameNode
46888 DataNode
{code}
{code:title=Appendix. Abnormal jn thread dump.|borderStyle=solid}
hdfs@localhost:~> jstack 45930
2017-05-12 10:42:52
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode):
"Attach Listener" daemon prio=10 tid=0x00007f87e478b800 nid=0x110a3 waiting on
condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"DestroyJavaVM" prio=10 tid=0x00007f87e400f000 nid=0xb392 waiting on condition
[0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"pool-1-thread-1" prio=10 tid=0x00007f87e4a2e000 nid=0xb3ae waiting on
condition [0x00007f87d9a92000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000ef60b7a8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"Timer for 'JournalNode' metrics system" daemon prio=10 tid=0x00007f87e4896000
nid=0xb3ac in Object.wait() [0x00007f87d9de4000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000ed889a90> (a java.util.TaskQueue)
at java.util.TimerThread.mainLoop(Timer.java:552)
- locked <0x00000000ed889a90> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"Service Thread" daemon prio=10 tid=0x00007f87e40a8000 nid=0xb3a2 runnable
[0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007f87e40a5800 nid=0xb3a1 waiting
on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007f87e40a2800 nid=0xb3a0 waiting
on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007f87e4098800 nid=0xb39f runnable
[0x0000000000000000]
java.lang.Thread.State: RUNNABLE
......
{code}
These exceptions should be caught and make later jn daemon exit, I apply my
patch, and start jn once more, it exit.
*2.Trunk has made excellent shell rewrite and improvement that avoid multiple
jn daemons in advance*
{code:title=Shell makes protection|borderStyle=solid}
root:~/version/hadoop-3.0.0-alpha2/bin$ ./hdfs journalnode
journalnode is running as process 30273. Stop it first.
{code}
Yet I still think journalnode itself should catch exceptions as namenode and
datanode do, this will be better.
Above all, I split usage into HDFS-11806. And modify this issue to focus on
solving multiple journalnode daemons.
> Make Some Enhancements about JournalNode Daemon
> ------------------------------------------------
>
> Key: HDFS-11753
> URL: https://issues.apache.org/jira/browse/HDFS-11753
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: journal-node
> Affects Versions: 3.0.0-alpha2
> Reporter: Doris Gu
> Attachments: HDFS-11753.001.patch
>
>
> 1.Add support -h. Right now, if I use *hdfs journalnode -h* , I straightly
> start journalnode daemon. But generally speakingļ¼ I just want to look at the
> usage.
> 2.Add exception catch and termination. If I start journalnode with different
> directions stored pids, I get servel journalnode daemons that don't work for
> I config the same port.
> {quote}[hdfs@localhost ~]$ jps
> *10107 JournalNode*
> *46023 JournalNode*
> 57944 NameNode
> 46539 Jps
> 57651 DFSZKFailoverController
> 57909 DataNode
> *57739 JournalNode*
> *45721 JournalNode*{quote}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]