[jira] [Created] (MAPREDUCE-6635) Unsafe long to int conversion in UncompressedSplitLineReader and IndexOutOfBoundsException

Sergey Shelukhin (JIRA) Tue, 16 Feb 2016 16:46:12 -0800

Sergey Shelukhin created MAPREDUCE-6635:
-------------------------------------------


             Summary: Unsafe long to int conversion in 
UncompressedSplitLineReader and IndexOutOfBoundsException
                 Key: MAPREDUCE-6635
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6635
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Sergey Shelukhin


LineRecordReader creates the unsplittable reader like so:
{noformat}
      in = new UncompressedSplitLineReader(
          fileIn, job, recordDelimiter, split.getLength());
{noformat}
Split length goes to
{noformat}
  private long splitLength;
{noformat}
At some point when reading the first line, fillBuffer does this:
{noformat}
  @Override
  protected int fillBuffer(InputStream in, byte[] buffer, boolean inDelimiter)
      throws IOException {
    int maxBytesToRead = buffer.length;
    if (totalBytesRead < splitLength) {
      maxBytesToRead = Math.min(maxBytesToRead,
                                (int)(splitLength - totalBytesRead));
{noformat}
which will be a negative number for large splits, and the subsequent dfs read 
will fail with a boundary check.
This has been reported here: https://issues.streamsets.com/browse/SDC-2229, 
also happens in Hive if very large text files are forced to be read in a single 
split (e.g. via header-skipping feature, or via set 
mapred.min.split.size=9999999999999999;)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6635) Unsafe long to int conversion in UncompressedSplitLineReader and IndexOutOfBoundsException

Reply via email to