[
https://issues.apache.org/jira/browse/HIVE-26147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessandro Solimando updated HIVE-26147:
----------------------------------------
Description:
When _hive.acid.key.index_ is missing for an acid ORC file _OrcRawRecordMerger_
throws as follows:
{noformat}
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:795)
~[hive-exec-4.0.0-alpha-2-SNAPS
HOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:1053)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.
0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:2096)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-a
lpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1991)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4
.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:769)
~[hive-exec-4.0.0-alpha
-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-
alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-
SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:719)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNA
PSHOT]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:671)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:233)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
at
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:489)
~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:
4.0.0-alpha-2-SNAPSHOT]
... 24 more
{noformat}
For this situation to happen, the ORC file must have more than one stripe, and
the offset of the element to seek should either locate it beyond the first
stripe (but before the last one), or in the first one if not the last one, as
the code shows:
{code:java}
if (firstStripe != 0) {
minKey = keyIndex[firstStripe - 1];
}
if (!isTail) {
maxKey = keyIndex[firstStripe + stripeCount - 1];
}
{code}
However, in the context of the detection of the original issue, the NPE was
triggered even by a simple "select *" over a table with ORC files missing the
_hive.acid.key.index_ metadata information, but it was never failing for ORC
files with a single stripe. The file was generated after a major compaction of
acid and non-acid data.
If the "select *" is not triggering the NPE, either pick the values of the row
obtained with "select * from $table limit 1", or try to select based on
different values trying to get into the sought situation with a filter like
this:
{code:sql}
select * from $table where c = $value
{code}
_OrcRawRecordMerger_ should simply leave as "null" the min and max keys when
the _hive.acid.key.index_ metadata is missing.
was:
When _hive.acid.key.index_ is missing for an acid ORC file _OrcRawRecordMerger_
throws as follows:
{noformat}
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:795)
~[hive-exec-4.0.0-alpha-2-SNAPS
HOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:1053)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.
0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:2096)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-a
lpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1991)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4
.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:769)
~[hive-exec-4.0.0-alpha
-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-
alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-
SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:719)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNA
PSHOT]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:671)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:233)
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
at
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:489)
~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:
4.0.0-alpha-2-SNAPSHOT]
... 24 more
{noformat}
For this situation to happen, the ORC file must have more than one stripe, and
the offset of the element to seek should be locate it beyond the first stripe
but before the tail one, as the code clearly suggests:
{code:java}
if (firstStripe != 0) {
minKey = keyIndex[firstStripe - 1];
}
if (!isTail) {
maxKey = keyIndex[firstStripe + stripeCount - 1];
}
{code}
However, in the context of the detection of the original issue, the NPE was
triggered even by a simple "select *" over a table with ORC files missing the
_hive.acid.key.index_ metadata information, but it was never failing for ORC
files with a single stripe. The file was generated after a major compaction of
acid and non-acid data.
In order to force an offset located in a stripe in the middle, one can use the
following query, knowing in what stripe a particular value exists:
{code:sql}
select * from $table where c = $value
{code}
_OrcRawRecordMerger_ should simply leave as "null" the min and max keys when
the _hive.acid.key.index_ metadata is missing.
> OrcRawRecordMerger throws NPE when hive.acid.key.index is missing for an acid
> file
> ----------------------------------------------------------------------------------
>
> Key: HIVE-26147
> URL: https://issues.apache.org/jira/browse/HIVE-26147
> Project: Hive
> Issue Type: Bug
> Components: ORC, Transactions
> Affects Versions: 4.0.0-alpha-2
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
>
> When _hive.acid.key.index_ is missing for an acid ORC file
> _OrcRawRecordMerger_ throws as follows:
> {noformat}
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:795)
> ~[hive-exec-4.0.0-alpha-2-SNAPS
> HOT.jar:4.0.0-alpha-2-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:1053)
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.
> 0.0-alpha-2-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:2096)
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-a
> lpha-2-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1991)
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4
> .0.0-alpha-2-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:769)
> ~[hive-exec-4.0.0-alpha
> -2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335)
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-
> alpha-2-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560)
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
> -2-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529)
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-
> SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150)
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:719)
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNA
> PSHOT]
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:671)
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:233)
> ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
> -2-SNAPSHOT]
> at
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:489)
> ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:
> 4.0.0-alpha-2-SNAPSHOT]
> ... 24 more
> {noformat}
> For this situation to happen, the ORC file must have more than one stripe,
> and the offset of the element to seek should either locate it beyond the
> first stripe (but before the last one), or in the first one if not the last
> one, as the code shows:
> {code:java}
> if (firstStripe != 0) {
> minKey = keyIndex[firstStripe - 1];
> }
> if (!isTail) {
> maxKey = keyIndex[firstStripe + stripeCount - 1];
> }
> {code}
> However, in the context of the detection of the original issue, the NPE was
> triggered even by a simple "select *" over a table with ORC files missing the
> _hive.acid.key.index_ metadata information, but it was never failing for ORC
> files with a single stripe. The file was generated after a major compaction
> of acid and non-acid data.
> If the "select *" is not triggering the NPE, either pick the values of the
> row obtained with "select * from $table limit 1", or try to select based on
> different values trying to get into the sought situation with a filter like
> this:
> {code:sql}
> select * from $table where c = $value
> {code}
> _OrcRawRecordMerger_ should simply leave as "null" the min and max keys when
> the _hive.acid.key.index_ metadata is missing.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)