devmadhuu commented on PR #6360:
URL: https://github.com/apache/ozone/pull/6360#issuecomment-2021927026
> Thanks for taking up this usability issue @devmadhuu. I'm a bit worried
about the current approach introducing other bugs in Recon and SCM that will be
hard to detect. It is very difficult to tell if the nodes are saved to some
other in-memory structure at the time of removal. Even just in the same
`ReconNodeManager` class I see two maps that also contain datanode information
that are not being updated. That said, I'm not sure of a better way to do it
since Recon needs to persist this removed node information through restarts. Is
there anything we can do to make sure that removing nodes does not corrupt any
existing data structures?
>
> Also I think nodes should get automatically re-added if they heartbeat to
Recon. Especially since we don't have a way to manually re-add a node right now.
@errose28 thanks for reviewing the patch. Here is my analysis, Pls check and
confirm:
1. SCMNodeManager: When a new node is registered, following data
structures gets populated:
- [ ] org.apache.hadoop.hdds.scm.node.SCMNodeManager#clusterMap
- [ ] org.apache.hadoop.hdds.scm.node.NodeStateManager#nodeStateMap
- [ ] org.apache.hadoop.hdds.scm.node.SCMNodeManager#dnsToUuidMap
So when we are removing a Datanode to stop tracking, we should clear above
data structures for the removed node ?
On Adding new Node, NewNodeHandler gets called where we do following:
- [ ] PipelineManager calls
org.apache.hadoop.hdds.scm.pipeline.PipelineManager#closeStalePipelines
So on removing node as well, we should call this method to clean any stale
pipelines associated with removed DN.
If a node being removed is in “DECOMMISSIONED” or “IN_MAINTENANCE” but not
in “DEAD” state, then on remove API call, should we call
“org.apache.hadoop.hdds.scm.node.DeadNodeHandler#onMessage” ? Because here in
DeadNodeHandler, we are doing following cleanup:
- [ ] Close containers associated with Datanode
- [ ] Destroy pipelines associated with Datanode.
- [ ] Remove the container replicas associated with Datanode.
- [ ] Remove commands in command queue for the Datanode.
- [ ] Remove DeleteBlocksCommand associated with the Datanode.
- [ ] Move dead datanode out of ClusterNetworkTopology.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]