rmatharu commented on a change in pull request #1108: SAMZA-2277: Semantics for 
cluster-manager.container.retry.window.ms not reflected in code
URL: https://github.com/apache/samza/pull/1108#discussion_r305991882
 
 

 ##########
 File path: 
samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java
 ##########
 @@ -409,32 +411,34 @@ public void onResourceCompleted(SamzaResourceStatus 
resourceStatus) {
             lastFailureTime = failure.getLastFailure();
           } else {
             currentFailCount = 1;
-            lastFailureTime = 0L;
+            lastFailureTime = Instant.now().toEpochMilli();
           }
-          if (currentFailCount >= retryCount) {
-            long lastFailureMsDiff = System.currentTimeMillis() - 
lastFailureTime;
-
-            if (lastFailureMsDiff < retryWindowMs) {
-              log.error("Processor ID: {} (current Container ID: {}) has 
failed {} times, with last failure {} ms ago. " +
-                  "This is greater than retry count of {} and window of {} ms, 
" +
-                  "so shutting down the application master and marking the job 
as failed.",
-                  processorId, containerId, currentFailCount, 
lastFailureMsDiff, retryCount, retryWindowMs);
-
-              // We have too many failures, and we're within the window
-              // boundary, so reset shut down the app master.
-              tooManyFailedContainers = true;
-              state.status = SamzaApplicationState.SamzaAppStatus.FAILED;
-            } else {
-              log.info("Resetting failure count for Processor ID: {} back to 
1, since last failure " +
-                  "(for Container ID: {}) was outside the bounds of the retry 
window.", processorId, containerId);
-
-              // Reset counter back to 1, since the last failure for this
-              // container happened outside the window boundary.
-              processorFailures.put(processorId, new ProcessorFailure(1, 
System.currentTimeMillis()));
-            }
+
+          long lastFailureMsDiff = Instant.now().toEpochMilli() - 
lastFailureTime;
 
 Review comment:
   Minor: Might be preferable to set lastFailureMsDiff in the if-condition 
above, to avoid time-skew in measurements.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to