[ 
https://issues.apache.org/jira/browse/TEZ-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277201#comment-14277201
 ] 

Hitesh Shah commented on TEZ-1947:
----------------------------------

Code also has a typo that could be fixed: "Invlaid configuration:" 

MR had a notion of checking job specifications before anything is run. This was 
done on the client as part of submission. We could probably do something 
similar but this will affect all runtime library components. Also, the question 
is whether to run this on the client or in the AM? The AM need not have all the 
necessary jars to instantiate all custom objects.  

> Failing fast when DAG configs have wrong values can save cluster resources
> --------------------------------------------------------------------------
>
>                 Key: TEZ-1947
>                 URL: https://issues.apache.org/jira/browse/TEZ-1947
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>
> It would be beneficial to do certain config checks (wherever possible) 
> upfront rather having fail later in the downstream.  For e.g, in the 
> following example the DAG failed after 400+ seconds for some config issue.
> {code}
> Status: Running (Executing on YARN cluster with App id 
> application_1421164610335_0060)
> --------------------------------------------------------------------------------
>         VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
> KILLED
> --------------------------------------------------------------------------------
> Map 1 ......          KILLED    251        170        0       81       0      
> 81
> Reducer 2             FAILED   1009          0        0     1009      23    
> 1008
> --------------------------------------------------------------------------------
> VERTICES: 00/02  [===>>-----------------------] 13%   ELAPSED TIME: 449.01 s
> --------------------------------------------------------------------------------
> Status: Failed
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1421164610335_0060_1_01, 
> diagnostics=[Task failed, taskId=task_1421164610335_0060_1_01_000004, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit 
> should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, 
> mergeThreshold: 148668720
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.<init>(MergeManager.java:260)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.<init>(Shuffle.java:206)
>         at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124)
>         at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405)
>         at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> ], TaskAttempt 1 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit 
> should be less than mergeThresholdmaxSingleShuffleLimit: 238251152, 
> mergeThreshold: 148668720
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.<init>(MergeManager.java:260)
>         at 
> org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.<init>(Shuffle.java:206)
>         at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:124)
>         at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:405)
>         at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.call(LogicalIOProcessorRuntimeTask.java:393)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to