devmadhuu commented on PR #6360:
URL: https://github.com/apache/ozone/pull/6360#issuecomment-2021927026

   > Thanks for taking up this usability issue @devmadhuu. I'm a bit worried 
about the current approach introducing other bugs in Recon and SCM that will be 
hard to detect. It is very difficult to tell if the nodes are saved to some 
other in-memory structure at the time of removal. Even just in the same 
`ReconNodeManager` class I see two maps that also contain datanode information 
that are not being updated. That said, I'm not sure of a better way to do it 
since Recon needs to persist this removed node information through restarts. Is 
there anything we can do to make sure that removing nodes does not corrupt any 
existing data structures?
   > 
   > Also I think nodes should get automatically re-added if they heartbeat to 
Recon. Especially since we don't have a way to manually re-add a node right now.
   
   @errose28 thanks for reviewing the patch. Here is my analysis, Pls check and 
confirm:
       1. SCMNodeManager: When a new node is registered, following data 
structures gets populated:
           - [ ] org.apache.hadoop.hdds.scm.node.SCMNodeManager#clusterMap
           - [ ] org.apache.hadoop.hdds.scm.node.NodeStateManager#nodeStateMap
           - [ ] org.apache.hadoop.hdds.scm.node.SCMNodeManager#dnsToUuidMap
   
   So when we are removing a Datanode to stop tracking, we should clear above 
data structures for the removed node ? 
   
   On Adding new Node, NewNodeHandler gets called where we do following:
           - [ ] PipelineManager calls 
org.apache.hadoop.hdds.scm.pipeline.PipelineManager#closeStalePipelines
   So on removing node as well, we should call this method to clean any stale 
pipelines associated with removed DN.
   
   If a node being removed is in “DECOMMISSIONED” or “IN_MAINTENANCE” but not 
in “DEAD” state, then on remove API call, should we call 
“org.apache.hadoop.hdds.scm.node.DeadNodeHandler#onMessage” ? Because here in 
DeadNodeHandler, we are doing following cleanup:
           - [ ] Close containers associated with Datanode
           - [ ] Destroy pipelines associated with Datanode.
           - [ ] Remove the container replicas associated with Datanode.
           - [ ] Remove commands in command queue for the Datanode.
           - [ ] Remove DeleteBlocksCommand associated with the Datanode.
           - [ ] Move dead datanode out of ClusterNetworkTopology.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to