[ 
https://issues.apache.org/jira/browse/TEZ-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224931#comment-15224931
 ] 

Jason Lowe commented on TEZ-3196:
---------------------------------

Sample stacktrace:
{noformat}
2016-04-02 08:44:03,058 [INFO] [TezChild] |task.TezTaskRunner|: Encounted an 
error while executing task: attempt_1458300907858_475320_1_01_000934_3
org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
 error in shuffle in fetcher {scope_168} #27
        at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POShuffleTezLoad.attachInputs(POShuffleTezLoad.java:121)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.initializeInputs(PigProcessor.java:332)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:210)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
 error in shuffle in fetcher {scope_168} #27
        at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:360)
        at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:336)
        ... 5 more
Caused by: java.lang.InternalError: lzo1x_decompress returned: -8
        at 
com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method)
        at 
com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:292)
        at 
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:88)
        at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199)
        at 
org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.readToMemory(IFile.java:626)
        at 
org.apache.tez.runtime.library.common.shuffle.ShuffleUtils.shuffleToMemory(ShuffleUtils.java:113)
        at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyMapOutput(FetcherOrderedGrouped.java:510)
        at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:286)
        at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:176)
        at 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:191)
{noformat}

MapReduce addressed this in MAPREDUCE-5053, and it looks like Tez needs a 
similar fix.

> java.lang.InternalError from decompression codec is fatal to a task during 
> shuffle
> ----------------------------------------------------------------------------------
>
>                 Key: TEZ-3196
>                 URL: https://issues.apache.org/jira/browse/TEZ-3196
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jason Lowe
>             Fix For: 0.7.1
>
>
> Many codecs throw java.lang.InternalError when their native implementations 
> encounter an error in the codec.  This is not treated like a fetch failure 
> and instead is fatal to the task.  The task should treat codec errors during 
> fetch like other fetch failures and retry, hopefully triggering a re-run of 
> the upstream task if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to