[
https://issues.apache.org/jira/browse/HADOOP-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Douglas updated HADOOP-5459:
----------------------------------
Assignee: Chris Douglas
Status: Patch Available (was: Open)
> CRC errors not detected reading intermediate output into memory with
> problematic length
> ---------------------------------------------------------------------------------------
>
> Key: HADOOP-5459
> URL: https://issues.apache.org/jira/browse/HADOOP-5459
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: Chris Douglas
> Assignee: Chris Douglas
> Priority: Blocker
> Attachments: 5459-0.patch, 5459-1.patch
>
>
> It's possible that the expected, uncompressed length of the segment is less
> than the available/decompressed data. This can happen in some worst-cases for
> compression, but it is exceedingly rare. It is also possible (though also
> fantastically unlikely) for the data to deflate to a size greater than that
> reported by the map. CRC errors will remain undetected because
> IFileInputStream does not validate the checksum until the end of the stream,
> and close() does not advance the stream to the end of the segment. The
> (abbreviated) read loop fetching data in shuffleInMemory:
> {code}
> int n = input.read(shuffleData, 0, shuffleData.length);
> while (n > 0) {
> bytesRead += n;
> n = input.read(shuffleData, bytesRead,
> (shuffleData.length-bytesRead));
> }
> {code}
> Will read only up to the expected length. Without reading the whole segment,
> the checksum is not validated. Even if IFileInputStream instances are closed,
> they should always validate checksums.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.