[ 
https://issues.apache.org/jira/browse/TEZ-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated TEZ-1634:
-------------------------
    Fix Version/s: 0.6.0

> BlockCompressorStream.finish() is called twice in IFile.close leading to 
> Shuffle errors
> ---------------------------------------------------------------------------------------
>
>                 Key: TEZ-1634
>                 URL: https://issues.apache.org/jira/browse/TEZ-1634
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>             Fix For: 0.6.0
>
>         Attachments: BlockCompressorStream.with.logging.java, 
> TEZ-1634.1.patch, TEZ-1634.2.patch, stacktrace-with-comments.txt
>
>
> When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And 
> as a part of FSDataOutputStream.close(), it again internally calls finish().  
> Please refer o.a.h.i.compress.BlockCompressorStream for more details on 
> finish(). This leads to additional 4 bytes being written to IFile.  This 
> causes issues randomly during shuffle.  Also, this prevents IFileInputStream 
> to do the proper checksumming.  
> This error happens only when we try to fetch multiple attempt outputs using 
> the same URL.  And is easily reproducible with SnappCompressionCodec.  First 
> attempt output would be downloaded by fetcher and due to the last 4 bytes in 
> the stream, it wouldn't do the proper checksumming in IFileInputStream.  This 
> causes the subsequent attempt download to fail with the following exception.
> Example error in shuffle phase is attached below.
> >>>>
> 2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31] 
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id 
> java.lang.IllegalArgumentException: Invalid header received:  partition: 0
>       at 
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
>       at 
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
>       at 
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
> >>>>
> I will attach the debug version of BlockCompressionStream with threaddump 
> (which validates that finish() is called twice in IFile.close()).  This bug 
> was present in earlier versions of Tez as well, and was able to consistently 
> reproduce it now on local-vm itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to