[
https://issues.apache.org/jira/browse/TEZ-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402982#comment-15402982
]
Jason Lowe commented on TEZ-3391:
---------------------------------
bq. The Tez AM also reported that too many containers where running, while in
practice it was not.
This was technically "correct" in the sense that the DAG status is reporting
how many *tasks*, not attempts, are running. One cannot assume that the
counter being shown for "Running: " means how many task attempts or containers
are currently executing.
A task goes into the running state as soon as the first attempt for it is
launched. In this particular case a large number of tasks all had one attempt
start and then promptly fail. That left the tasks in the running state. Most
were waiting for another attempt to launch with no attempt for them currently
running. The key distinction is task vs. attempt. A task can be in the
running state with no attempt currently running for it. A separate
RunningTaskAttempts counter being reported in the DAG status would have made
this more explicit.
> MR split file validation should be done in the AM
> -------------------------------------------------
>
> Key: TEZ-3391
> URL: https://issues.apache.org/jira/browse/TEZ-3391
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rohini Palaniswamy
>
> We had a case where Split metadata size exceeded 10000000. Instead of job
> failing from validation during initialization in AM like mapreduce, each of
> the tasks failed doing that validation during initialization.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)