xiarixiaoyao opened a new pull request #2722:
URL: https://github.com/apache/hudi/pull/2722


   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   Fix the bug introduced by HUDI-892
   HUDI-892 introduce this problem。
   this pr skip adding projection columns if there are no log files in the 
hoodieRealtimeSplit。 but this pr donnot consider that multiple getRecordReaders 
share same jobConf。
   Consider the following questions:
   we have four getRecordReaders:
   reader1(its hoodieRealtimeSplit contains no log files)
   reader2 (its hoodieRealtimeSplit contains log files)
   reader3(its hoodieRealtimeSplit contains log files)
   reader4(its hoodieRealtimeSplit contains no log files)
   
   now reader1 run first, HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP in 
jobConf will be set to be true, and no hoodie additional projection columns 
will be added to jobConf (see 
HoodieParquetRealtimeInputFormat.addProjectionToJobConf)
   
   reader2 run later, since HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP in 
jobConf is set to be true, no hoodie additional projection columns will be 
added to jobConf. (see HoodieParquetRealtimeInputFormat.addProjectionToJobConf)
   which lead to the result that _hoodie_record_key would be missing and merge 
step would throw exceptions
   
   2021-03-25 20:23:14,014 | INFO  | AsyncDispatcher event handler | 
Diagnostics report from attempt_1615883368881_0038_m_000000_0: Error: 
java.lang.NullPointerException2021-03-25 20:23:14,014 | INFO  | AsyncDispatcher 
event handler | Diagnostics report from attempt_1615883368881_0038_m_000000_0: 
Error: java.lang.NullPointerException at 
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:101)
 at 
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:43)
 at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:79)
 at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:36)
 at 
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:92)
 at 
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.next(RealtimeCompactedRecordReader.java:43)
 at org.apache.hudi.
 
hadoop.realtime.HoodieRealtimeRecordReader.next(HoodieRealtimeRecordReader.java:79)
 at 
org.apache.hudi.hadoop.realtime.HoodieCombineRealtimeRecordReader.next(HoodieCombineRealtimeRecordReader.java:68)
 at 
org.apache.hudi.hadoop.realtime.HoodieCombineRealtimeRecordReader.next(HoodieCombineRealtimeRecordReader.java:77)
 at 
org.apache.hudi.hadoop.realtime.HoodieCombineRealtimeRecordReader.next(HoodieCombineRealtimeRecordReader.java:42)
 at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:205)
 at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:191) 
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at 
org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:183) at 
java.security.AccessController.doPrivileged(Native Method) at javax
 .security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:177)
   
    
   Obviously, this is an occasional problem。 if reader2 run first, hoodie 
additional projection columns will be added to jobConf and in this case the 
query will be ok
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   
   This pull request is already covered by existing tests
   
   
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to