sodonnel commented on PR #4968:
URL: https://github.com/apache/ozone/pull/4968#issuecomment-1604555171

   There are two things which concern me about this change:
   
   1. It is fine for EC pipeline to be built and never added to the manager, 
but for Ratis, it is not. Building a Ratis pipeline actually sends messages to 
the DNs to establish the Ratis ring. Therefore it might make sense to throw an 
error in a call comes into BuildPipeline with a Ratis replicationConfig.
   2. I was worried about a race condition where we create a pipeline, and then 
some of the nodes go stale or enter some maintenance state. Both of those 
events trigger the closing of the pipelines on the affect nodes via the 
StaleNodeHandler and the StartDatanodeAdminHandler. In both cases, these 
handlers get the pipeline list on the datanode from NodeManager, which stores 
it in a concurrent hash map. So even before this change, there is a chance that 
a pipeline is created, right as the node state changes, and the 
staleNodeHandler or StartDatanodeAdminHandler will miss the pipeline as it has 
not made it into the nodeManager map yet. For stale nodes, it would get caught 
when it goes dead. For decommission / maintenance, I believe it would be a 
problem (this problem already exists in theory and is no worse due to this 
change) as the DatanodeAdminMonitorImpl would keep waiting until the pipelines 
get closed. It would make sense to do something in DatanodeAdminMonitorImpl to 
trigger ano
 ther close if it finds pipelines still open.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to