[
https://issues.apache.org/jira/browse/TEZ-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224931#comment-15224931
]
Jason Lowe commented on TEZ-3196:
---------------------------------
Sample stacktrace:
{noformat}
2016-04-02 08:44:03,058 [INFO] [TezChild] |task.TezTaskRunner|: Encounted an
error while executing task: attempt_1458300907858_475320_1_01_000934_3
org.apache.pig.backend.executionengine.ExecException: ERROR 0:
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
error in shuffle in fetcher {scope_168} #27
at
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POShuffleTezLoad.attachInputs(POShuffleTezLoad.java:121)
at
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.initializeInputs(PigProcessor.java:332)
at
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:210)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by:
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
error in shuffle in fetcher {scope_168} #27
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:360)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:336)
... 5 more
Caused by: java.lang.InternalError: lzo1x_decompress returned: -8
at
com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method)
at
com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:292)
at
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
at
org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readToMemory(IFile.java:626)
at
org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToMemory(ShuffleUtils.java:113)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyMapOutput(FetcherOrderedGrouped.java:510)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:286)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:176)
at
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:191)
{noformat}
MapReduce addressed this in MAPREDUCE-5053, and it looks like Tez needs a
similar fix.
> java.lang.InternalError from decompression codec is fatal to a task during
> shuffle
> ----------------------------------------------------------------------------------
>
> Key: TEZ-3196
> URL: https://issues.apache.org/jira/browse/TEZ-3196
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Jason Lowe
> Fix For: 0.7.1
>
>
> Many codecs throw java.lang.InternalError when their native implementations
> encounter an error in the codec. This is not treated like a fetch failure
> and instead is fatal to the task. The task should treat codec errors during
> fetch like other fetch failures and retry, hopefully triggering a re-run of
> the upstream task if necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)