[
https://issues.apache.org/jira/browse/HDFS-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083229#comment-16083229
]
Anu Engineer commented on HDFS-12098:
-------------------------------------
@Weiwei yang, Can you please share your repro steps once again ? or look at
this test patch that I have created ?
I have added a disable SCM call, when tests run, I can see we do not hit the
SCM.
{code}
java.net.SocketTimeoutException: Call From hw11767.home/192.168.29.224 to
0.0.0.0:58880 failed on socket timeout exception:
java.net.SocketTimeoutException: 1000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels
{code}
However, I am not able to see many Datanode state machine threads. Please see
the attached snapshot from my profiler.
I have also attached a test case that I developed to simulate and debug this
case.
Thanks
Anu
> Ozone: Datanode is unable to register with scm if scm starts later
> ------------------------------------------------------------------
>
> Key: HDFS-12098
> URL: https://issues.apache.org/jira/browse/HDFS-12098
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode, ozone, scm
> Reporter: Weiwei Yang
> Assignee: Weiwei Yang
> Priority: Critical
> Attachments: HDFS-12098-HDFS-7240.001.patch,
> HDFS-12098-HDFS-7240.002.patch, Screen Shot 2017-07-11 at 4.58.08 PM.png,
> thread_dump.log
>
>
> Reproducing steps
> # Start datanode
> # Wait and see datanode state, it has connection issues, this is expected
> # Start SCM, expecting datanode could connect to the scm and the state
> machine could transit to RUNNING. However in actual, its state transits to
> SHUTDOWN, datanode enters chill mode.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]