Rajesh Balamohan updated TEZ-3479:
    Attachment: application_1476667862449_0031_not_complete.1.log.tar.gz

> DAG AM does not schedule any more containers in corner cases
> ------------------------------------------------------------
>                 Key: TEZ-3479
>                 URL: https://issues.apache.org/jira/browse/TEZ-3479
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.7.1
>            Reporter: Rajesh Balamohan
>         Attachments: application_1476667862449_0031_not_complete.1.log.tar.gz
> Env: 3 node AWS cluster with data residing in S3. Tez version is 0.7.
> Some workloads end up generating lots of data that the tasks start throwing 
> "No space available" in local disks (e.g Q29 in TPCDS). DAG should fail after 
> enough number of retries which happens most of the time. Once in a while (~ 
> once in 20-30 runs), DAG AM gets into hung state and does not schedule any 
> more containers for the failed task attempts. Will attach the logs shortly. 

This message was sent by Atlassian JIRA

Reply via email to