slfan1989 commented on PR #7542: URL: https://github.com/apache/ozone/pull/7542#issuecomment-2579090673
> Have all the unhealthy replicas used up all available DNs on the cluster, so that there are no other free hosts to create a healthy copy? > > What error did you see when the reconstruction failed? There have been some bugs fixed around reconstruction over time, which could cause a container to not get recovered. > > If the container is not under replicated (ie all the replicas are there an healthy), then the unhealthy ones should be cleaned up by the ClosedWithUnhealthyReplicasHandler, which should mark the container over replicated and then the unhealthy ones should be deleted. From the javadoc: @sodonnel Thank you very much for your response! Due to some necessary upgrades and modifications to the server room racks, we had to take a large number of machines offline. After our discussion, we adjusted some of the SCM and DN config, and so far, everything seems to be meeting expectations. The main cause of the issue I described earlier was some of our custom modifications, which led to EC timing out during reconstruction. As a result, SCM kept selecting new DNs as the target for reconstruction. However, the positive aspect is that we identified the issue before all DN machines had been retried, and we have already implemented some fixes. I think we should set a limit on the number of reconstruction attempts for EC containers, perhaps 3 times. If the attempts exceed 3, we should stop further reconstruction of that container until the cluster administrator intervenes. I would like to hear your thoughts on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
