[
https://issues.apache.org/jira/browse/TEZ-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152757#comment-14152757
]
Jeff Zhang commented on TEZ-1621:
---------------------------------
bq. The new throw TezException is to shutdown the local runner?
No, it only shut down TezChild which cause TaskAttempt fail, and finally job
failed, local runner exit.
bq. Did we get a chance to walk through the above flow in the debugger to see
that the threading/exceptions are flowing as we expect them too.
Works as I expected. *One concern I have is that there's duplicated code for
checking exception types in TezTaskRunner and TaskRunnerCallable, not sure the
purpose, but one of them could been removed IMO*
bq. Also would be good to try this in a single node cluster to make sure we
have not broken other things and this works in the RPC non-local mode as well.
Run it in local mode, and verify the error thrown from Processor, exception
could been see in client and TezChild is shut down correctly.
> Should report error to AM before shuting down TezChild
> ------------------------------------------------------
>
> Key: TEZ-1621
> URL: https://issues.apache.org/jira/browse/TEZ-1621
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Deepesh Khandelwal
> Assignee: Jeff Zhang
> Attachments: Tez-1621-2.patch, Tez-1621.patch, app_logs.txt,
> console.txt
>
>
> While running an in session testorderedwordcount example the DAG failed with
> the following error on the console:
> {noformat}
> 14/09/25 01:55:53 INFO examples.TestOrderedWordCount: DAG 1 diagnostics:
> [Vertex failed, vertexName=initialmap,
> vertexId=vertex_1411586515507_0110_1_00, diagnostics=[Task failed,
> taskId=task_1411586515507_0110_1_00_000000, diagnostics=[TaskAttempt 0
> failed, info=[Container container_1411586515507_0110_01_000002 finished with
> diagnostics set to [Container failed. Exception from container-launch.
> Container id: container_1411586515507_0110_01_000002
> Exit code: 255
> Stack trace: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:290)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This wasn't very helpful, the root cause is in the application log:
> {noformat}
> 2014-09-25 01:55:41,246 ERROR [TezChild]
> org.apache.tez.runtime.task.TezTaskRunner: Exception of type Error. Exiting
> now
> java.lang.UnsatisfiedLinkError:
> org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;J)V
> at org.apache.hadoop.util.NativeCrc32.nativeVerifyChunkedSums(Native
> Method)
> at
> org.apache.hadoop.util.NativeCrc32.verifyChunkedSums(NativeCrc32.java:57)
> at
> org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:291)
> at
> org.apache.hadoop.hdfs.BlockReaderLocal.fillBuffer(BlockReaderLocal.java:344)
> at
> org.apache.hadoop.hdfs.BlockReaderLocal.fillDataBuf(BlockReaderLocal.java:444)
> at
> org.apache.hadoop.hdfs.BlockReaderLocal.readWithBounceBuffer(BlockReaderLocal.java:575)
> at
> org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:539)
> at
> org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:683)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:739)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:796)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
> at
> org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
> at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
> at
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149)
> at
> org.apache.hadoop.mapreduce.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.nextKeyValue(TezGroupedSplitsInputFormat.java:167)
> at
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116)
> at
> org.apache.tez.mapreduce.processor.map.MapProcessor$NewRecordReader.nextKeyValue(MapProcessor.java:266)
> at
> org.apache.tez.mapreduce.hadoop.mapreduce.MapContextImpl.nextKeyValue(MapContextImpl.java:81)
> at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at
> org.apache.tez.mapreduce.processor.map.MapProcessor.runNewMapper(MapProcessor.java:237)
> at
> org.apache.tez.mapreduce.processor.map.MapProcessor.run(MapProcessor.java:124)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-09-25 01:55:41,250 INFO [TezChild] org.apache.hadoop.util.ExitUtil:
> Exiting with status -1
> {noformat}
> Attached are the complete console.log and application log.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)