[jira] [Commented] (MAPREDUCE-6635) Unsafe long to int conversion in UncompressedSplitLineReader and IndexOutOfBoundsException

Hudson (JIRA) Tue, 23 Feb 2016 01:26:25 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158612#comment-15158612
 ]


Hudson commented on MAPREDUCE-6635:
-----------------------------------

FAILURE: Integrated in Hadoop-trunk-Commit #9346 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9346/])
MAPREDUCE-6635. Unsafe long to int conversion in (vvasudev: rev 
c6f2d761d5430eac6b9f07f137a7028de4e0660c)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java


> Unsafe long to int conversion in UncompressedSplitLineReader and 
> IndexOutOfBoundsException
> ------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6635
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6635
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Junping Du
>            Priority: Critical
>             Fix For: 2.8.0, 2.7.3, 2.6.5
>
>         Attachments: MAPREDUCE-6635.patch
>
>
> LineRecordReader creates the unsplittable reader like so:
> {noformat}
>       in = new UncompressedSplitLineReader(
>           fileIn, job, recordDelimiter, split.getLength());
> {noformat}
> Split length goes to
> {noformat}
>   private long splitLength;
> {noformat}
> At some point when reading the first line, fillBuffer does this:
> {noformat}
>   @Override
>   protected int fillBuffer(InputStream in, byte[] buffer, boolean inDelimiter)
>       throws IOException {
>     int maxBytesToRead = buffer.length;
>     if (totalBytesRead < splitLength) {
>       maxBytesToRead = Math.min(maxBytesToRead,
>                                 (int)(splitLength - totalBytesRead));
> {noformat}
> which will be a negative number for large splits, and the subsequent dfs read 
> will fail with a boundary check.
> {noformat}
> java.lang.IndexOutOfBoundsException
>         at java.nio.Buffer.checkBounds(Buffer.java:559)
>         at java.nio.ByteBuffer.get(ByteBuffer.java:668)
>         at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:279)
>         at 
> org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:172)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:744)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:800)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:860)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:903)
>         at java.io.DataInputStream.read(DataInputStream.java:149)
>         at 
> org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.fillBuffer(UncompressedSplitLineReader.java:59)
>         at 
> org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
>         at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
>         at 
> org.apache.hadoop.mapreduce.lib.input.UncompressedSplitLineReader.readLine(UncompressedSplitLineReader.java:91)
>         at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.skipUtfByteOrderMark(LineRecordReader.java:144)
>         at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:184)
> {noformat}
> This has been reported here: https://issues.streamsets.com/browse/SDC-2229, 
> also happens in Hive if very large text files are forced to be read in a 
> single split (e.g. via header-skipping feature, or via set 
> mapred.min.split.size=9999999999999999)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6635) Unsafe long to int conversion in UncompressedSplitLineReader and IndexOutOfBoundsException

Reply via email to