Rajesh Balamohan created TEZ-3479:
-------------------------------------

             Summary: DAG AM does not schedule any more containers in corner 
cases
                 Key: TEZ-3479
                 URL: https://issues.apache.org/jira/browse/TEZ-3479
             Project: Apache Tez
          Issue Type: Improvement
            Reporter: Rajesh Balamohan



Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7.

Some workloads end up generating lots of data that the tasks start throwing "No 
space available" in local disks (e.g Q29 in TPCDS). DAG should fail after 
enough number of retries which happens most of the time. Once in a while (~ 
once in 20-30 runs), DAG AM gets into hung state and does not schedule any more 
containers for the failed task attempts. Will attach the logs shortly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to