CRC errors not detected reading intermediate output into memory with 
problematic length
---------------------------------------------------------------------------------------

                 Key: HADOOP-5459
                 URL: https://issues.apache.org/jira/browse/HADOOP-5459
             Project: Hadoop Core
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: Chris Douglas
            Priority: Blocker


It's possible that the expected, uncompressed length of the segment is less 
than the available/decompressed data. This can happen in some worst-cases for 
compression, but it is exceedingly rare. It is also possible (though also 
fantastically unlikely) for the data to deflate to a size greater than that 
reported by the map. CRC errors will remain undetected because IFileInputStream 
does not validate the checksum until the end of the stream, and close() does 
not advance the stream to the end of the segment. The (abbreviated) read loop 
fetching data in shuffleInMemory:

{code}
int n = input.read(shuffleData, 0, shuffleData.length);
while (n > 0) { 
  bytesRead += n;
  n = input.read(shuffleData, bytesRead, 
                 (shuffleData.length-bytesRead));
} 
{code}

Will read only up to the expected length. Without reading the whole segment, 
the checksum is not validated. Even if IFileInputStream instances are closed, 
they should always validate checksums.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to