Yi Zhang created TEZ-4513:
-----------------------------

             Summary: Add feature to fail DAG when too many re-runs
                 Key: TEZ-4513
                 URL: https://issues.apache.org/jira/browse/TEZ-4513
             Project: Apache Tez
          Issue Type: Improvement
    Affects Versions: 0.10.2
            Reporter: Yi Zhang


Sometimes when nodes failure happen, shuffle data are lost and producer tasks 
are re-run, those tasks' ancestor in turn may need to re-run, but cluster may 
not have enough resources to re-run those tasks fast. In this scenario, it may 
be desirable to fail the DAG.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to