[ 
https://issues.apache.org/jira/browse/TEZ-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1238:
-----------------------------

    Summary: Display more clear diagnostics info on client side on task 
failures   (was: Display more clear diagnostics info on client side if missing 
jar in LocalResource or Exception happen in Processor)

> Display more clear diagnostics info on client side on task failures 
> --------------------------------------------------------------------
>
>                 Key: TEZ-1238
>                 URL: https://issues.apache.org/jira/browse/TEZ-1238
>             Project: Apache Tez
>          Issue Type: Sub-task
>    Affects Versions: 0.4.0
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: Tez-1238-2.patch, Tez-1238-3.patch, Tez-1238.patch
>
>
> I have a tez job which is failed due to that I didn't put my jar to the local 
> resources. But on the client side, the exception is not clear for me to 
> figure what's wrong with it. The real reason is that It couldn't load the 
> Processor class. I have to run command "yarn logs" to find the real exception 
> in the container logs.  
> I also have another case that has exception in the my Processor, the message 
> on the client side is still not clear to me. I think that should we pass the 
> real exception to the diagnostics and display it in client side, this should 
> help user to find out what's wrong with their program.
> *Exception on client side*
> {code}
> 14/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> summer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed:
> 114/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1
> Killed: 014/06/26 14:57:15 INFO rpc.DAGClientRPCImpl: DAG completed.
> FinalState=FAILEDDAG diagnostics:[Vertex failed, vertexName=tokenizer,
> vertexId=vertex_1403765612557_0004_1_00, diagnostics=[Task failed,
> taskId=task_1403765612557_0004_1_00_000000, diagnostics=[TaskAttempt 0
> failed, info=[Container container_1403765612557_0004_01_000002 COMPLETED
> with diagnostics set to [Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> org.apache.hadoop.util.Shell$ExitCodeException: at
> org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
> at org.apache.hadoop.util.Shell.run(Shell.java:418)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
> at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(
> DefaultContainerExecutor.java:195)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
> ContainerLaunch.java:300)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(
> ContainerLaunch.java:81)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Container exited with a non-zero exit code 1
> {code}
> *The real exception in container log:*
> {code}
> 2014-06-26 14:57:02,146 ERROR [main]
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
> Thread[main,5,main] threw an Exception.
> org.apache.tez.dag.api.TezUncheckedException: Unable to load class:
> com.zjffdu.tutorial.tez.WordCount$TokenProcessor
>     at org.apache.tez.common.RuntimeUtils.getClazz(RuntimeUtils.java:44)
>     at
> org.apache.tez.common.RuntimeUtils.createClazzInstance(RuntimeUtils.java:66)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.createProcessor(LogicalIOProcessorRuntimeTask.java:533)
>     at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.<init>(LogicalIOProcessorRuntimeTask.java:146)
>     at
> org.apache.tez.runtime.task.TezTaskRunner.<init>(TezTaskRunner.java:78)
>     at org.apache.tez.runtime.task.TezChild.run(TezChild.java:208)
>     at org.apache.tez.runtime.task.TezChild.main(TezChild.java:363)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to