[
https://issues.apache.org/jira/browse/HIVE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prasanth J updated HIVE-6320:
-----------------------------
Description:
ORC data reader crashes out on a BufferUnderflowException, while trying to read
data row-by-row with the predicate push-down enabled on current trunk.
*Stack trace:*
{code}
Caused by: java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:472)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:117)
at
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:207)
at
org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:240)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:53)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:288)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:510)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1581)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2707)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:125)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:101)
{code}
OR it could be
{code}
Caused by: java.lang.IndexOutOfBoundsException
at java.nio.ByteBuffer.wrap(ByteBuffer.java:352)
at
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:180)
at
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:197)
at
org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:252)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:59)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:300)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:475)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1159)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2198)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:108)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:57)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
... 15 more
{code}
The query run is
{code}
set hive.vectorized.execution.enabled=false;
set hive.optimize.index.filter=true;
insert overwrite directory '/tmp/foo' select * from lineitem where l_orderkey
is not null;
{code}
*Reason:*
The issue is related to generating the disk range boundaries. If two adjacent
row groups have same compressed block offset then the worst case slop that was
added to the end offset will contain only the current compression block. In
some cases the values towards the end of this compression block will stretch
beyond the boundary to fetch values causing BufferUnderFlowException or
IndexOutOfBoundsException.
was:
ORC data reader crashes out on a BufferUnderflowException, while trying to read
data row-by-row with the predicate push-down enabled on current trunk.
Stack trace:
{code}
Caused by: java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:472)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:117)
at
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:207)
at
org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:240)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:53)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:288)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:510)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1581)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2707)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:125)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:101)
{code}
OR it could be
{code}
Caused by: java.lang.IndexOutOfBoundsException
at java.nio.ByteBuffer.wrap(ByteBuffer.java:352)
at
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:180)
at
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:197)
at
org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:252)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:59)
at
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:300)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:475)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1159)
at
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2198)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:108)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:57)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
... 15 more
{code}
The query run is
{code}
set hive.vectorized.execution.enabled=false;
set hive.optimize.index.filter=true;
insert overwrite directory '/tmp/foo' select * from lineitem where l_orderkey
is not null;
{code}
*Reason:*
The issue is related to generating the disk range boundaries. If two adjacent
row groups have same compressed block offset then the worst case slop that was
added to the end offset will contain only the current compression block. In
some cases the values towards the end of this compression block will stretch
beyond the boundary to fetch values causing BufferUnderFlowException or
IndexOutOfBoundsException.
> Row-based ORC reader with PPD turned on dies on
> BufferUnderFlowException/IndexOutOfBoundsException
> ---------------------------------------------------------------------------------------------------
>
> Key: HIVE-6320
> URL: https://issues.apache.org/jira/browse/HIVE-6320
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Affects Versions: 0.13.0
> Reporter: Gopal V
> Assignee: Prasanth J
> Labels: orcfile
> Fix For: 0.13.0
>
> Attachments: HIVE-6320.1.patch, HIVE-6320.2.patch, HIVE-6320.2.patch,
> HIVE-6320.3.patch
>
>
> ORC data reader crashes out on a BufferUnderflowException, while trying to
> read data row-by-row with the predicate push-down enabled on current trunk.
> *Stack trace:*
> {code}
> Caused by: java.nio.BufferUnderflowException
> at java.nio.Buffer.nextGetIndex(Buffer.java:472)
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:117)
> at
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:207)
> at
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
> at
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:240)
> at
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:53)
> at
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:288)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$IntTreeReader.next(RecordReaderImpl.java:510)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1581)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2707)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:125)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:101)
> {code}
> OR it could be
> {code}
> Caused by: java.lang.IndexOutOfBoundsException
> at java.nio.ByteBuffer.wrap(ByteBuffer.java:352)
> at
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:180)
> at
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:197)
> at
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readInts(SerializationUtils.java:450)
> at
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readDirectValues(RunLengthIntegerReaderV2.java:252)
> at
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:59)
> at
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:300)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:475)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1159)
> at
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2198)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:108)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:57)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
> ... 15 more
> {code}
> The query run is
> {code}
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.index.filter=true;
> insert overwrite directory '/tmp/foo' select * from lineitem where l_orderkey
> is not null;
> {code}
> *Reason:*
> The issue is related to generating the disk range boundaries. If two adjacent
> row groups have same compressed block offset then the worst case slop that
> was added to the end offset will contain only the current compression block.
> In some cases the values towards the end of this compression block will
> stretch beyond the boundary to fetch values causing BufferUnderFlowException
> or IndexOutOfBoundsException.
--
This message was sent by Atlassian JIRA
(v6.2#6252)