slfan1989 commented on PR #7542:
URL: https://github.com/apache/ozone/pull/7542#issuecomment-2579090673

   > Have all the unhealthy replicas used up all available DNs on the cluster, 
so that there are no other free hosts to create a healthy copy?
   > 
   > What error did you see when the reconstruction failed? There have been 
some bugs fixed around reconstruction over time, which could cause a container 
to not get recovered.
   > 
   > If the container is not under replicated (ie all the replicas are there an 
healthy), then the unhealthy ones should be cleaned up by the 
ClosedWithUnhealthyReplicasHandler, which should mark the container over 
replicated and then the unhealthy ones should be deleted. From the javadoc:
   
   @sodonnel Thank you very much for your response! Due to some necessary 
upgrades and modifications to the server room racks, we had to take a large 
number of machines offline. After our discussion, we adjusted some of the SCM 
and DN config, and so far, everything seems to be meeting expectations.
   
   The main cause of the issue I described earlier was some of our custom 
modifications, which led to EC timing out during reconstruction. As a result, 
SCM kept selecting new DNs as the target for reconstruction. However, the 
positive aspect is that we identified the issue before all DN machines had been 
retried, and we have already implemented some fixes.
   
   I think we should set a limit on the number of reconstruction attempts for 
EC containers, perhaps 3 times. If the attempts exceed 3, we should stop 
further reconstruction of that container until the cluster administrator 
intervenes. I would like to hear your thoughts on this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to