[
https://issues.apache.org/jira/browse/HIVE-16548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992006#comment-15992006
]
Sergey Shelukhin commented on HIVE-16548:
-----------------------------------------
I can repro this on the same data... looks like the determination of what to
read is incorrect, it tries to read from the wrong place because the range from
disk is missing.
E.g. we try to get CB (or CBs, due to ORC index related safety margin at the
end) at [30615403, 30781251) from the stream at [30572092, 30781251), but the
corresponding two chunks from disk and/or cache are [30517488, 30571555) and
[30745405, 30781251) - the start of the range, and probably the entire CB we
want, is missing from the data. This can lead to all kinds of errors. Need to
investigate why.
> LLAP: EncodedReaderImpl.addOneCompressionBuffer throws NPE
> ----------------------------------------------------------
>
> Key: HIVE-16548
> URL: https://issues.apache.org/jira/browse/HIVE-16548
> Project: Hive
> Issue Type: Bug
> Components: llap
> Reporter: Rajesh Balamohan
>
> Env: Based on apr-25 apache master codebase.
> {noformat}
> Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer
> size too small. size = 65536 needed = 3762509
> at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:695)
> at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:454)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:420)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:242)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:239)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:239)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:93)
> ... 6 more
> Caused by: java.lang.IllegalArgumentException: Buffer size too small. size =
> 65536 needed = 3762509
> at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.addOneCompressionBuffer(EncodedReaderImpl.java:1223)
> at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.prepareRangesForCompressedRead(EncodedReaderImpl.java:813)
> at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:685)
> ... 15 more
> Caused by: java.io.IOException: java.io.IOException:
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
> at
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
> ... 17 more
> Caused by: java.io.IOException: java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:695)
> at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:454)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:420)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:242)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:239)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:239)
> at
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:93)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.addOneCompressionBuffer(EncodedReaderImpl.java:1282)
> at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.prepareRangesForCompressedRead(EncodedReaderImpl.java:813)
> at
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:685)
> ... 15 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)