tao meng created HUDI-1722:
------------------------------

             Summary: hive beeline/spark-sql  query specified field on mor 
table occur NPE
                 Key: HUDI-1722
                 URL: https://issues.apache.org/jira/browse/HUDI-1722
             Project: Apache Hudi
          Issue Type: Bug
          Components: Hive Integration, Spark Integration
    Affects Versions: 0.7.0
         Environment: spark2.4.5, hadoop3.1.1, hive 3.1.1
            Reporter: tao meng
             Fix For: 0.9.0


HUDI-892 introduce this problem。
this pr skip adding projection columns if there are no log files in the 
hoodieRealtimeSplit。 but this pr donnot consider that multiple getRecordReaders 
share same jobConf。
Consider the following questions:
we have four getRecordReaders: 
reader1(its hoodieRealtimeSplit contains no log files)
reader2 (its hoodieRealtimeSplit contains log files)
reader3(its hoodieRealtimeSplit contains log files)
reader4(its hoodieRealtimeSplit contains no log files)

now reader1 run first, HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP in 
jobConf will be set to be true, and no hoodie additional projection columns 
will be added to jobConf (see 
HoodieParquetRealtimeInputFormat.addProjectionToJobConf)

reader2 run later, since HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP in 
jobConf is set to be true, no hoodie additional projection columns will be 
added to jobConf. (see HoodieParquetRealtimeInputFormat.addProjectionToJobConf)
which lead to the result that _hoodie_record_key would be missing and merge 
step would throw exceptions

Caused by: java.io.IOException: java.lang.NullPointerException
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:611)
 at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:518)
 at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150)
 at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:3296)
 at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:252)
 at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:522)
 ... 24 more
Caused by: java.lang.NullPointerException
 at 
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:93)
 at 
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:43)
 
 at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:79)
 
 at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:36)
 
 at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:578)
 at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:518)
 at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) 
 at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:3296) 
 at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:252) 
 at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:522)


Obviously, this is an occasional problem。 if reader2 run first, hoodie 
additional projection columns will be added to jobConf and in this case the 
query will be ok



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to