[ 
https://issues.apache.org/jira/browse/TEZ-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402926#comment-15402926
 ] 

Rohini Palaniswamy commented on TEZ-3391:
-----------------------------------------

The Tez AM also reported that too many containers where running, while in 
practice it was not.

{code}
2016-07-28 23:33:00,162 [Timer-1] INFO  
org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - DAG Status: 
status=RUNNING, progress=TotalTasks: 311387 Succeeded: 69939 Running: 164954 
Failed: 0 Killed: 0 FailedTaskAttempts: 156417 KilledTaskAttempts: 4807, 
diagnostics=, counters=null 
{code}

{code}
016-07-28 23:02:55,170 [INFO] [Dispatcher thread {Central}] 
|history.HistoryEventHandler|: 
[HISTORY][DAG:Dag_1468638337805_452514_1][Event:TASK_ATTEMPT_FINISHED]: 
vertexName=scope-6, taskAttemptId=attempt_1468638337805_452514_1_03_000013_0, 
creationTime=1469746848974, allocationTime=1469746936804, 
startTime=1469746969091, finishTime=1469746975099, timeTaken=6008, 
status=FAILED, errorEnum=FRAMEWORK_ERROR, diagnostics=Error: Failure while 
running task:java.io.IOException: Split metadata size exceeded 10000000. 
Aborting job 
        at 
org.apache.hadoop.mapreduce.split.SplitMetaInfoReaderTez.readSplitMetaInfo(SplitMetaInfoReaderTez.java:79)
        at 
org.apache.tez.mapreduce.lib.MRInputUtils.readSplits(MRInputUtils.java:53)
        at 
org.apache.tez.mapreduce.input.MRInput.initializeInternal(MRInput.java:470)
        at org.apache.tez.mapreduce.input.MRInput.initialize(MRInput.java:443)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeInputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:446)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeInputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:429)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeInputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:414)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

It would be good to move this validation to MRInputSplitDistributor

> MR split file validation should be done in the AM
> -------------------------------------------------
>
>                 Key: TEZ-3391
>                 URL: https://issues.apache.org/jira/browse/TEZ-3391
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>
>   We had a case  where Split metadata size exceeded 10000000. Instead of job 
> failing from validation during initialization in AM like mapreduce, each of 
> the tasks failed doing that validation during initialization.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to