[
https://issues.apache.org/jira/browse/TEZ-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gopal V updated TEZ-1634:
-------------------------
Attachment: TEZ-1634.2.patch
Small cosmetic change, for easier debugging.
Please review - [~rajesh.balamohan].
> BlockCompressorStream.finish() is called twice in IFile.close leading to
> Shuffle errors
> ---------------------------------------------------------------------------------------
>
> Key: TEZ-1634
> URL: https://issues.apache.org/jira/browse/TEZ-1634
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: BlockCompressorStream.with.logging.java,
> TEZ-1634.1.patch, TEZ-1634.2.patch, stacktrace-with-comments.txt
>
>
> When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And
> as a part of FSDataOutputStream.close(), it again internally calls finish().
> Please refer o.a.h.i.compress.BlockCompressorStream for more details on
> finish(). This leads to additional 4 bytes being written to IFile. This
> causes issues randomly during shuffle. Also, this prevents IFileInputStream
> to do the proper checksumming.
> This error happens only when we try to fetch multiple attempt outputs using
> the same URL. And is easily reproducible with SnappCompressionCodec. First
> attempt output would be downloaded by fetcher and due to the last 4 bytes in
> the stream, it wouldn't do the proper checksumming in IFileInputStream. This
> causes the subsequent attempt download to fail with the following exception.
> Example error in shuffle phase is attached below.
> >>>>
> 2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31]
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id
> java.lang.IllegalArgumentException: Invalid header received: partition: 0
> at
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
> at
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
> at
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
> >>>>
> I will attach the debug version of BlockCompressionStream with threaddump
> (which validates that finish() is called twice in IFile.close()). This bug
> was present in earlier versions of Tez as well, and was able to consistently
> reproduce it now on local-vm itself.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)