adoroszlai opened a new pull request, #4285: URL: https://github.com/apache/ozone/pull/4285
## What changes were proposed in this pull request? `UnhealthyReplicationProcessor#processAll` requeues any failed task. Such tasks are attempted in the same `processAll` call, before exiting the loop. This can flood SCM logs until the cause of the error is resolved. This causes Github's environment to [run out of disk space](https://github.com/adoroszlai/hadoop-ozone/actions/runs/4205417969/jobs/7297733162#step:5:1527) in just a few minutes after testing EC reconstruction read (test being added in HDDS-7982). This PR proposes to collect failed container health results and requeue them only after exiting the loop. https://issues.apache.org/jira/browse/HDDS-7989 ## How was this patch tested? Added unit test. Also verified together with HDDS-7982 (which uncovered the problem without this fix): https://github.com/adoroszlai/hadoop-ozone/actions/runs/4207471575/jobs/7302558782 Regular CI: https://github.com/adoroszlai/hadoop-ozone/actions/runs/4207414175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
