sodonnel commented on PR #4968: URL: https://github.com/apache/ozone/pull/4968#issuecomment-1604555171
There are two things which concern me about this change: 1. It is fine for EC pipeline to be built and never added to the manager, but for Ratis, it is not. Building a Ratis pipeline actually sends messages to the DNs to establish the Ratis ring. Therefore it might make sense to throw an error in a call comes into BuildPipeline with a Ratis replicationConfig. 2. I was worried about a race condition where we create a pipeline, and then some of the nodes go stale or enter some maintenance state. Both of those events trigger the closing of the pipelines on the affect nodes via the StaleNodeHandler and the StartDatanodeAdminHandler. In both cases, these handlers get the pipeline list on the datanode from NodeManager, which stores it in a concurrent hash map. So even before this change, there is a chance that a pipeline is created, right as the node state changes, and the staleNodeHandler or StartDatanodeAdminHandler will miss the pipeline as it has not made it into the nodeManager map yet. For stale nodes, it would get caught when it goes dead. For decommission / maintenance, I believe it would be a problem (this problem already exists in theory and is no worse due to this change) as the DatanodeAdminMonitorImpl would keep waiting until the pipelines get closed. It would make sense to do something in DatanodeAdminMonitorImpl to trigger ano ther close if it finds pipelines still open. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
