rmatharu commented on a change in pull request #1108: SAMZA-2277: Semantics for
cluster-manager.container.retry.window.ms not reflected in code
URL: https://github.com/apache/samza/pull/1108#discussion_r305991882
##########
File path:
samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java
##########
@@ -409,32 +411,34 @@ public void onResourceCompleted(SamzaResourceStatus
resourceStatus) {
lastFailureTime = failure.getLastFailure();
} else {
currentFailCount = 1;
- lastFailureTime = 0L;
+ lastFailureTime = Instant.now().toEpochMilli();
}
- if (currentFailCount >= retryCount) {
- long lastFailureMsDiff = System.currentTimeMillis() -
lastFailureTime;
-
- if (lastFailureMsDiff < retryWindowMs) {
- log.error("Processor ID: {} (current Container ID: {}) has
failed {} times, with last failure {} ms ago. " +
- "This is greater than retry count of {} and window of {} ms,
" +
- "so shutting down the application master and marking the job
as failed.",
- processorId, containerId, currentFailCount,
lastFailureMsDiff, retryCount, retryWindowMs);
-
- // We have too many failures, and we're within the window
- // boundary, so reset shut down the app master.
- tooManyFailedContainers = true;
- state.status = SamzaApplicationState.SamzaAppStatus.FAILED;
- } else {
- log.info("Resetting failure count for Processor ID: {} back to
1, since last failure " +
- "(for Container ID: {}) was outside the bounds of the retry
window.", processorId, containerId);
-
- // Reset counter back to 1, since the last failure for this
- // container happened outside the window boundary.
- processorFailures.put(processorId, new ProcessorFailure(1,
System.currentTimeMillis()));
- }
+
+ long lastFailureMsDiff = Instant.now().toEpochMilli() -
lastFailureTime;
Review comment:
Minor: Might be preferable to set lastFailureMsDiff in the if-condition
above, to avoid time-skew in measurements.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services