MikeBarskii opened a new pull request, #1701:
URL: https://github.com/apache/samza/pull/1701

   LISAMZA-43659
   
   **Description**: According to 
`org/apache/samza/clustermanager/ContainerProcessManager.java:520`
   
   The rules to shut down the whole app if too many container failures have 
happened:
   
   1. Failure count for a task group id must be > the configured retry count
   2. The last failure (the one prior to this one) must have happened less than 
retry window ms ago
   
   **Issue**: 
`org/apache/samza/clustermanager/ContainerProcessManager.java:575` doesn't 
reflect point 2 of the counting behavior well
   
   > Processor ID: {} (current Container ID: {}) has failed {} times, with last 
failure {} ms ago. This is greater than retry count of {} and window of {} ms
   
   **Fix**: Add to the logs information about point 2
   
   > Processor ID: {} (current Container ID: {}) has failed {} times. This is 
greater than the retry count of {}. The failure occurred {} ms after the 
previous one, which is less than the retry window of {} ms."


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to