abstractdog commented on a change in pull request #152:
URL: https://github.com/apache/tez/pull/152#discussion_r753052967



##########
File path: tez-api/src/main/java/org/apache/tez/dag/api/TezConfiguration.java
##########
@@ -300,6 +300,24 @@ public TezConfiguration(boolean loadDefaults) {
       TEZ_AM_PREFIX + "max.allowed.time-sec.for-read-error";
   public static final int 
TEZ_AM_MAX_ALLOWED_TIME_FOR_TASK_READ_ERROR_SEC_DEFAULT = 300;
 
+  /**
+   * Double value. Assuming that a certain number of downstream hosts reported 
fetch failure for a
+   * given upstream host, this config drives the max allowed ratio of 
(downstream hosts) / (all hosts).
+   * The total number of used hosts are tracked by AMNodeTracker, which 
divides the distinct number of
+   * downstream hosts blaming source(upstream) tasks in a given vertex. If the 
fraction is beyond this
+   * limit, the upstream task attempt is marked as failed (so blamed for the 
fetch failure).
+   * E.g. if this set to 0.2, in case of 3 different hosts reporting fetch 
failure
+   * for the same upstream host in a cluster which currently utilizes 10 
nodes, the upstream task
+   * is immediately blamed for the fetch failure.
+   *
+   * Expert level setting.
+   */
+  @ConfigurationScope(Scope.AM)
+  @ConfigurationProperty(type="integer")
+  public static final String 
TEZ_AM_MAX_ALLOWED_DOWNSTREAM_HOST_FAILURES_FRACTION =
+      TEZ_AM_PREFIX + "max.allowed.downstream.host.failures.fraction";
+  public static final double 
TEZ_AM_MAX_ALLOWED_DOWNSTREAM_HOST_FAILURES_FRACTION_DEFAULT = 0.2;

Review comment:
       I see, I'm assuming on that small cluster a fraction: 0.25 might work 
properly (so in case of 4 hosts, 1 failing downstream won't make the source 
restart immediately, at least 2 downstream reporting hosts are needed)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to