[
https://issues.apache.org/jira/browse/HDDS-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309178#comment-17309178
]
Glen Geng commented on HDDS-5033:
---------------------------------
[~adoroszlai] Could you please confirm this issue ?
> SCM may not be able to know full port list of Datanode after Datanode is
> restarted.
> -----------------------------------------------------------------------------------
>
> Key: HDDS-5033
> URL: https://issues.apache.org/jira/browse/HDDS-5033
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM
> Affects Versions: 1.2.0
> Reporter: Glen Geng
> Priority: Major
> Attachments: 企业微信截图_097abd79-0ea4-487b-9b07-6bc2330385ef.png,
> 企业微信截图_c0bd5dde-98ee-4350-914d-2e0069ea8602.png
>
>
> Please check attachment.
>
> After restart DN, the SCM may not know the full ports of that DN.
>
> This issue can not be solved without restart SCM. The consequence is that
> Datanode can not participate any pipeline, and there will be continually NPE
> in DN.
> {code:java}
> 2021-03-25 15:04:16,322 [Command processor thread] ERROR
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine:
> Critical Error : Command processor thread encountered an error. Thread:
> Thread[Command processor thread,5,main]
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdds.ratis.RatisHelper.toRaftPeerAddress(RatisHelper.java:99)
> at
> org.apache.hadoop.hdds.ratis.RatisHelper.raftPeerBuilderFor(RatisHelper.java:119)
> at
> org.apache.hadoop.hdds.ratis.RatisHelper.toRaftPeer(RatisHelper.java:111)
> at
> org.apache.hadoop.hdds.ratis.RatisHelper.newRaftGroup(RatisHelper.java:149)
> at
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CreatePipelineCommandHandler.handle(CreatePipelineCommandHandler.java:91)
> at
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:99)
> at
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$2(DatanodeStateMachine.java:506)
> at java.lang.Thread.run(Thread.java:748)
> 2021-03-25 15:04:16,323 [Command processor thread] ERROR
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine:
> Critical Error : Command processor thread encountered an error. Thread:
> Thread[Command processor thread,5,main]
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdds.ratis.RatisHelper.toRaftPeerAddress(RatisHelper.java:99)
> at
> org.apache.hadoop.hdds.ratis.RatisHelper.raftPeerBuilderFor(RatisHelper.java:119)
> at
> org.apache.hadoop.hdds.ratis.RatisHelper.toRaftPeer(RatisHelper.java:111)
> at
> org.apache.hadoop.hdds.ratis.RatisHelper.newRaftGroup(RatisHelper.java:149)
> at
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CreatePipelineCommandHandler.handle(CreatePipelineCommandHandler.java:91)
> at
> org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:99)
> at
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$2(DatanodeStateMachine.java:506)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>
> After restart SCM, this issue gone.
> The issue should be: SCMNodeManager just record the DatanodeDetails once
> during register.
> But for DN, it won’t record the admin, server, client port into
> DatanodeDetails until its ratis server is up.
> Thus there is contention here: if the register request is reported before
> ratis server is up, SCM won’t know full port list of that DN.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]