[ 
https://issues.apache.org/jira/browse/TEZ-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1634:
----------------------------------
    Description: 
When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And as 
a part of FSDataOutputStream.close(), it again internally calls finish().  
Please refer o.a.h.i.compress.BlockCompressorStream for more details on 
finish(). This leads to additional 4 bytes being written to IFile.  This causes 
issues randomly during shuffle.  Also, this prevents IFileInputStream to do the 
proper checksumming.  

This error happens only when we try to fetch multiple attempt outputs using the 
same URL.  And is easily reproducible with SnappCompressionCodec.  First 
attempt output would be downloaded by fetcher and due to the last 4 bytes in 
the stream, it wouldn't do the proper checksumming in IFileInputStream.  This 
causes the subsequent attempt download to fail with the following exception.

Example error in shuffle phase is attached below.

>>>>
2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31] 
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id 
java.lang.IllegalArgumentException: Invalid header received:  partition: 0
        at 
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
        at 
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
        at 
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
>>>>

I will attach the debug version of BlockCompressionStream with threaddump 
(which validates that finish() is called twice in IFile.close()).  This bug was 
present in earlier versions of Tez as well, and we were able to consistently 
reproduce it now on local-vm itself.

  was:
When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And as 
a part of FSDataOutputStream.close(), it again internally calls finish().  
Please refer o.a.h.i.compress.BlockCompressorStream for more details on 
finish(). This leads to additional 4 bytes being written to IFile.  This causes 
issues randomly during shuffle.  Also, this prevents IFileInputStream to do the 
proper checksumming.  

This error happens only when we try to fetch multiple attempt outputs using the 
same URL.  And is easily reproducible with SnappCompressionCodec.  First 
attempt output would be downloaded by fetcher and due to the last 4 bytes in 
the stream, it wouldn't do the proper checksumming in IFileInputStream.  This 
causes the subsequent attempt download to fail with the following exception.

Example error in shuffle phase is attached below.

>>>>
2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31] 
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id 
java.lang.IllegalArgumentException: Invalid header received:  partition: 0
        at 
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
        at 
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
        at 
org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
>>>>

I will attach the debug version of BlockCompressionStream with threaddump 
(which validates that finish() is called twice in IFile.close()).


> BlockCompressorStream.finish() is called twice in IFile.close leading to 
> Shuffle errors
> ---------------------------------------------------------------------------------------
>
>                 Key: TEZ-1634
>                 URL: https://issues.apache.org/jira/browse/TEZ-1634
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>
> When IFile.Writer is closed, it explicitly calls compressedOut.finish(); And 
> as a part of FSDataOutputStream.close(), it again internally calls finish().  
> Please refer o.a.h.i.compress.BlockCompressorStream for more details on 
> finish(). This leads to additional 4 bytes being written to IFile.  This 
> causes issues randomly during shuffle.  Also, this prevents IFileInputStream 
> to do the proper checksumming.  
> This error happens only when we try to fetch multiple attempt outputs using 
> the same URL.  And is easily reproducible with SnappCompressionCodec.  First 
> attempt output would be downloaded by fetcher and due to the last 4 bytes in 
> the stream, it wouldn't do the proper checksumming in IFileInputStream.  This 
> causes the subsequent attempt download to fail with the following exception.
> Example error in shuffle phase is attached below.
> >>>>
> 2014-09-15 09:54:22,950 WARN [fetcher [scope_41] #31] 
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher: Invalid map id 
> java.lang.IllegalArgumentException: Invalid header received:  partition: 0
>       at 
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyMapOutput(Fetcher.java:352)
>       at 
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.copyFromHost(Fetcher.java:294)
>       at 
> org.apache.tez.runtime.library.common.shuffle.impl.Fetcher.run(Fetcher.java:160)
> >>>>
> I will attach the debug version of BlockCompressionStream with threaddump 
> (which validates that finish() is called twice in IFile.close()).  This bug 
> was present in earlier versions of Tez as well, and we were able to 
> consistently reproduce it now on local-vm itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to