dnishimura commented on a change in pull request #1156: SAMZA-2323: Provide
option allow single containers to fail without failing the job
URL: https://github.com/apache/samza/pull/1156#discussion_r326292642
##########
File path:
samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java
##########
@@ -513,12 +509,23 @@ void
onResourceCompletedWithUnknownStatus(SamzaResourceStatus resourceStatus, St
int retryCount = clusterManagerConfig.getContainerRetryCount();
int retryWindowMs = clusterManagerConfig.getContainerRetryWindowMs();
int currentFailCount;
+ boolean retryContainerRequest = true;
if (retryCount == 0) {
LOG.error("Processor ID: {} (current Container ID: {}) failed, and retry
count is set to 0, " +
"so shutting down the application master and marking the job as
failed.", processorId, containerId);
- jobFailureCriteriaMet = true;
+ // Failure criteria met only if failed containers can fail the job.
+ jobFailureCriteriaMet =
clusterManagerConfig.getContainerFailJobAfterRetries();
+ if (jobFailureCriteriaMet) {
+ LOG.error("Processor ID: {} (current Container ID: {}) failed, and
retry count is set to 0, " +
+ "so shutting down the application master and marking the job as
failed.", processorId, containerId);
+ } else {
+ LOG.error("Processor ID: {} (current Container ID: {}) failed, and
retry count is set to 0, " +
+ "marking the container as completed and will not retry request.",
processorId, containerId);
+ incrementCompletedProcessorsAndUpdateState(state);
Review comment:
Correct we don't need the call. Removed it and tested on a real job.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services