Rajesh Balamohan created TEZ-1634:
-------------------------------------
Summary: BlockCompressorStream.finish() is called twice in
IFile.close leading to Shuffle errors
Key: TEZ-1634
URL: https://issues.apache.org/jira/browse/TEZ-1634
Project: Apache Tez
Issue Type: Bug
Reporter: Rajesh Balamohan
When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And as
a part of FSDataOutputStream.close(), it again internally calls finish().
Please refer o.a.h.i.compress.BlockCompressorStream for more details on
finish(). This leads to additional 4 bytes being written to IFile. This causes
issues randomly during shuffle. Also, this prevents IFileInputStream to do the
proper checksumming.
This error happens only when we try to fetch multiple attempt outputs using the
same URL. And is easily reproducible with SnappCompressionCodec. First
attempt output would be downloaded by fetcher and due to the last 4 bytes in
the stream, it wouldn't do the proper checksumming in IFileInputStream. This
causes the subsequent attempt download to fail with the following exception.
Example error in shuffle phase is attached below.
>>>>
2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31]
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id
java.lang.IllegalArgumentException: Invalid header received: partition: 0
at
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
at
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
at
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
>>>>
I will attach the debug version of BlockCompressionStream with threaddump
(which validates that finish() is called twice in IFile.close()).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)