adoroszlai opened a new pull request, #4285:
URL: https://github.com/apache/ozone/pull/4285

   ## What changes were proposed in this pull request?
   
   `UnhealthyReplicationProcessor#processAll` requeues any failed task.  Such 
tasks are attempted in the same `processAll` call, before exiting the loop.  
This can flood SCM logs until the cause of the error is resolved.
   
   This causes Github's environment to [run out of disk 
space](https://github.com/adoroszlai/hadoop-ozone/actions/runs/4205417969/jobs/7297733162#step:5:1527)
 in just a few minutes after testing EC reconstruction read (test being added 
in HDDS-7982).
   
   This PR proposes to collect failed container health results and requeue them 
only after exiting the loop.
   
   https://issues.apache.org/jira/browse/HDDS-7989
   
   ## How was this patch tested?
   
   Added unit test.
   
   Also verified together with HDDS-7982 (which uncovered the problem without 
this fix):
   
https://github.com/adoroszlai/hadoop-ozone/actions/runs/4207471575/jobs/7302558782
   
   Regular CI:
   https://github.com/adoroszlai/hadoop-ozone/actions/runs/4207414175


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to