[GitHub] [incubator-hudi] leilinen opened a new issue #663: Cannot query real time table

GitBox Sun, 05 May 2019 06:33:29 -0700

leilinen opened a new issue #663: Cannot query real time table
URL: https://github.com/apache/incubator-hudi/issues/663
 
 
   Hi
   
   I have written data into hdfs with hoodie, and sync hive table by 
hoodie-hive tool. It create a hoodie table and a hoodie real time table. when I 
query data from table, execute command like this:
   
   ```
   select count(1) from hudi.waybill_table7;
   
   select count(1) from hudi.waybill_table7_rt;
   ```
   
   It queried successfully to table hudi.waybill_table7 but got a null 
exception of hudi.waybill_table7_rt.
   
   ```
   com.uber.hoodie.exception.HoodieException: Error obtaining data file/log 
file grouping: hdfs://10.202.77.200:8020/tmp/leiline/waybill7/2019-04-19
       at 
com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat.lambda$getSplits$9(HoodieRealtimeInputFormat.java:145)
       at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1548)
       at 
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
       at 
com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat.getSplits(HoodieRealtimeInputFormat.java:103)
       at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:363)
       at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:486)
       at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
       at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
       at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
       at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1302)
       at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1299)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
       at org.apache.hadoop.mapreduce.Job.submit(Job.java:1299)
       at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
       at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
       at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
       at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:434)
       at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:138)
       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
       at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
       at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
       at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
       at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
       at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
       at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
       at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
       at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:498)
       at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
   Caused by: java.lang.NullPointerException
       at 
com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat.lambda$null$8(HoodieRealtimeInputFormat.java:126)
       at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
       at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
       at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
       at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
       at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
       at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
       at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
       at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
       at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
       at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
       at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
       at 
com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat.lambda$getSplits$9(HoodieRealtimeInputFormat.java:124)
       ... 42 more
   Job Submission failed with exception 
'com.uber.hoodie.exception.HoodieException(Error obtaining data file/log file 
grouping: hdfs://tmp/leiline/waybill7/2019-04-19)'
   
   ```
   
    The real time table structure like this:
   
   ```
   CREATE EXTERNAL TABLE `waybill_table7_rt`(
     `_hoodie_commit_time` string,
     `_hoodie_commit_seqno` string,
     `_hoodie_record_key` string,
     `_hoodie_partition_path` string,
     `_hoodie_file_name` string,
     `waybillid` string,
     `waybillno` string,
     `destzonecode` string,
     `quantity` int,
     `consignedtm` string,
     `cargotypecode` string,
     `limittypecode` string,
     `expresstypecode` string,
     `versionno` int,
     `lockversionno` int,
     `waybillremarks` string,
     `orderno` string,
     `updatetm` string,
     `createtm` string)
   PARTITIONED BY (
     `sourcezonecode` string)
   ROW FORMAT SERDE
     'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
   STORED AS INPUTFORMAT
     'com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat'
   OUTPUTFORMAT
     'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
     'hdfs://tmp/leiline/waybill7'
   TBLPROPERTIES (
     'last_commit_time_sync'='20190430150335',
     'transient_lastDdlTime'='1556604654')
   ```
   My hoodie version is 0.4.6 and I didn't change any code of hoodie writer ,  
I dont know why it throw a exception.
   
   I also have a real time table with previous data set  written by hoodie, and 
it can query without exceptions. I checked both data set and there is no 
difference.  
   
   I also want to know how i can debug HoodieRealtimeInputFormat in local ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leilinen opened a new issue #663: Cannot query real time table

Reply via email to