[
https://issues.apache.org/jira/browse/TEZ-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402926#comment-15402926
]
Rohini Palaniswamy commented on TEZ-3391:
-----------------------------------------
The Tez AM also reported that too many containers where running, while in
practice it was not.
{code}
2016-07-28 23:33:00,162 [Timer-1] INFO
org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status:
status=RUNNING, progress=TotalTasks: 311387 Succeeded: 69939 Running: 164954
Failed: 0 Killed: 0 FailedTaskAttempts: 156417 KilledTaskAttempts: 4807,
diagnostics=, counters=null
{code}
{code}
016-07-28 23:02:55,170 [INFO] [Dispatcher thread {Central}]
|history.HistoryEventHandler|:
[HISTORY][DAG:Dag_1468638337805_452514_1][Event:TASK_ATTEMPT_FINISHED]:
vertexName=scope-6, taskAttemptId=attempt_1468638337805_452514_1_03_000013_0,
creationTime=1469746848974, allocationTime=1469746936804,
startTime=1469746969091, finishTime=1469746975099, timeTaken=6008,
status=FAILED, errorEnum=FRAMEWORK_ERROR, diagnostics=Error: Failure while
running task:java.io.IOException: Split metadata size exceeded 10000000.
Aborting job
at
org.apache.hadoop.mapreduce.split.SplitMetaInfoReaderTez.readSplitMetaInfo(SplitMetaInfoReaderTez.java:79)
at
org.apache.tez.mapreduce.lib.MRInputUtils.readSplits(MRInputUtils.java:53)
at
org.apache.tez.mapreduce.input.MRInput.initializeInternal(MRInput.java:470)
at org.apache.tez.mapreduce.input.MRInput.initialize(MRInput.java:443)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeInputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:446)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeInputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:429)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeInputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:414)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
It would be good to move this validation to MRInputSplitDistributor
> MR split file validation should be done in the AM
> -------------------------------------------------
>
> Key: TEZ-3391
> URL: https://issues.apache.org/jira/browse/TEZ-3391
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rohini Palaniswamy
>
> We had a case where Split metadata size exceeded 10000000. Instead of job
> failing from validation during initialization in AM like mapreduce, each of
> the tasks failed doing that validation during initialization.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)