leilinen opened a new issue #663: Cannot query real time table URL: https://github.com/apache/incubator-hudi/issues/663 Hi I have written data into hdfs with hoodie, and sync hive table by hoodie-hive tool. It create a hoodie table and a hoodie real time table. when I query data from table, execute command like this: ``` select count(1) from hudi.waybill_table7; select count(1) from hudi.waybill_table7_rt; ``` It queried successfully to table hudi.waybill_table7 but got a null exception of hudi.waybill_table7_rt. ``` com.uber.hoodie.exception.HoodieException: Error obtaining data file/log file grouping: hdfs://10.202.77.200:8020/tmp/leiline/waybill7/2019-04-19 at com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat.lambda$getSplits$9(HoodieRealtimeInputFormat.java:145) at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1548) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) at com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat.getSplits(HoodieRealtimeInputFormat.java:103) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:363) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:486) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1302) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1299) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1299) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:434) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:138) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.NullPointerException at com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat.lambda$null$8(HoodieRealtimeInputFormat.java:126) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat.lambda$getSplits$9(HoodieRealtimeInputFormat.java:124) ... 42 more Job Submission failed with exception 'com.uber.hoodie.exception.HoodieException(Error obtaining data file/log file grouping: hdfs://tmp/leiline/waybill7/2019-04-19)' ``` The real time table structure like this: ``` CREATE EXTERNAL TABLE `waybill_table7_rt`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `waybillid` string, `waybillno` string, `destzonecode` string, `quantity` int, `consignedtm` string, `cargotypecode` string, `limittypecode` string, `expresstypecode` string, `versionno` int, `lockversionno` int, `waybillremarks` string, `orderno` string, `updatetm` string, `createtm` string) PARTITIONED BY ( `sourcezonecode` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'com.uber.hoodie.hadoop.realtime.HoodieRealtimeInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://tmp/leiline/waybill7' TBLPROPERTIES ( 'last_commit_time_sync'='20190430150335', 'transient_lastDdlTime'='1556604654') ``` My hoodie version is 0.4.6 and I didn't change any code of hoodie writer , I dont know why it throw a exception. I also have a real time table with previous data set written by hoodie, and it can query without exceptions. I checked both data set and there is no difference. I also want to know how i can debug HoodieRealtimeInputFormat in local ?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
