[ 
https://issues.apache.org/jira/browse/TEZ-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402982#comment-15402982
 ] 

Jason Lowe commented on TEZ-3391:
---------------------------------

bq. The Tez AM also reported that too many containers where running, while in 
practice it was not.

This was technically "correct" in the sense that the DAG status is reporting 
how many *tasks*, not attempts, are running.  One cannot assume that the 
counter being shown for "Running: " means how many task attempts or containers 
are currently executing.

A task goes into the running state as soon as the first attempt for it is 
launched.  In this particular case a large number of tasks all had one attempt 
start and then promptly fail.  That left the tasks in the running state.  Most 
were waiting for another attempt to launch with no attempt for them currently 
running.  The key distinction is task vs. attempt.  A task can be in the 
running state with no attempt currently running for it.  A separate 
RunningTaskAttempts counter being reported in the DAG status would have made 
this more explicit.

> MR split file validation should be done in the AM
> -------------------------------------------------
>
>                 Key: TEZ-3391
>                 URL: https://issues.apache.org/jira/browse/TEZ-3391
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>
>   We had a case  where Split metadata size exceeded 10000000. Instead of job 
> failing from validation during initialization in AM like mapreduce, each of 
> the tasks failed doing that validation during initialization.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to