Alessandro Solimando created HIVE-26147:
-------------------------------------------

             Summary: OrcRawRecordMerger throws NPE when hive.acid.key.index is 
missing for an acid file
                 Key: HIVE-26147
                 URL: https://issues.apache.org/jira/browse/HIVE-26147
             Project: Hive
          Issue Type: Bug
          Components: ORC, Transactions
    Affects Versions: 4.0.0-alpha-2
            Reporter: Alessandro Solimando
            Assignee: Alessandro Solimando


When _hive.acid.key.index_ is missing for an acid ORC file _OrcRawRecordMerger_ 
throws as follows:

{noformat}
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:795)
 ~[hive-exec-4.0.0-alpha-2-SNAPS
HOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:1053)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.
0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:2096)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-a
lpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1991)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4
.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:769)
 ~[hive-exec-4.0.0-alpha
-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-
alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-
SNAPSHOT]
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:719) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNA
PSHOT]
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:671) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:233) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
        at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:489)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:
4.0.0-alpha-2-SNAPSHOT]
        ... 24 more
{noformat}

For this situation to happen, the ORC file must have more than one stripe, and 
the offset of the element to seek should be locate it beyond the first stripe 
but before the tail one, as the code clearly suggests:

{code:java}
    if (firstStripe != 0) {
      minKey = keyIndex[firstStripe - 1];
    }
    if (!isTail) {
      maxKey = keyIndex[firstStripe + stripeCount - 1];
    }
{code}

However, in the context of the detection of the original issue, the NPE was 
triggered even by a simple "select *" over a table with ORC files missing the 
_hive.acid.key.index_ metadata information, but it was never failing for ORC 
files with a single stripe. The file was generated after a major compaction of 
acid and non-acid data.

In order to force an offset located in a stripe in the middle, one can use the 
following query, knowing in what stripe a particular value exists:

{code:sql}
select * from $table where c = $value
{code}

_OrcRawRecordMerger_ should simply leave as "null" the min and max keys when 
the _hive.acid.key.index_ metadata is missing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to