Yi Zhang created TEZ-4513: ----------------------------- Summary: Add feature to fail DAG when too many re-runs Key: TEZ-4513 URL: https://issues.apache.org/jira/browse/TEZ-4513 Project: Apache Tez Issue Type: Improvement Affects Versions: 0.10.2 Reporter: Yi Zhang
Sometimes when nodes failure happen, shuffle data are lost and producer tasks are re-run, those tasks' ancestor in turn may need to re-run, but cluster may not have enough resources to re-run those tasks fast. In this scenario, it may be desirable to fail the DAG. -- This message was sent by Atlassian Jira (v8.20.10#820010)