[ https://issues.apache.org/jira/browse/TEZ-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402982#comment-15402982 ]
Jason Lowe commented on TEZ-3391: --------------------------------- bq. The Tez AM also reported that too many containers where running, while in practice it was not. This was technically "correct" in the sense that the DAG status is reporting how many *tasks*, not attempts, are running. One cannot assume that the counter being shown for "Running: " means how many task attempts or containers are currently executing. A task goes into the running state as soon as the first attempt for it is launched. In this particular case a large number of tasks all had one attempt start and then promptly fail. That left the tasks in the running state. Most were waiting for another attempt to launch with no attempt for them currently running. The key distinction is task vs. attempt. A task can be in the running state with no attempt currently running for it. A separate RunningTaskAttempts counter being reported in the DAG status would have made this more explicit. > MR split file validation should be done in the AM > ------------------------------------------------- > > Key: TEZ-3391 > URL: https://issues.apache.org/jira/browse/TEZ-3391 > Project: Apache Tez > Issue Type: Bug > Reporter: Rohini Palaniswamy > > We had a case where Split metadata size exceeded 10000000. Instead of job > failing from validation during initialization in AM like mapreduce, each of > the tasks failed doing that validation during initialization. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)