GintokiYs opened a new issue #2513: URL: https://github.com/apache/hudi/issues/2513
**Describe the problem you faced** When I insert data through Hudi-Spark and synchronize the data to Hive, I can use Hive-Cli query this cow table and get the data (hudi-hadoop-mr-bundle-0.6.0 has been placed under ${HIVE_HOME}/lib). ```hive> select * from hudi_imp_par_mor_local_x1 where serial_no = '10000301345'; Query ID = root_20210201150400_fbb4e52b-c41d-4d6b-b1b8-4678a6642f2d Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator 21/02/01 15:04:02 INFO client.RMProxy: Connecting to ResourceManager at node103/10.20.29.103:8032 21/02/01 15:04:02 INFO client.RMProxy: Connecting to ResourceManager at node103/10.20.29.103:8032 Starting Job = job_1611822796186_0064, Tracking URL = http://node103:8088/proxy/application_1611822796186_0064/ Kill Command = /opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/hadoop job -kill job_1611822796186_0064 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2021-02-01 15:04:10,216 Stage-1 map = 0%, reduce = 0% 2021-02-01 15:04:17,510 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.01 sec MapReduce Total cumulative CPU time: 6 seconds 10 msec Ended Job = job_1611822796186_0064 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 6.01 sec HDFS Read: 10327962 HDFS Write: 397 HDFS EC Read: 0 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 10 msec OK 20210201145958 20210201145958_0_9 10000301345/001942775096/2 20190909 e3332789-77e5-4e6b-a0cd-24e87814c572-0_0-6-8_20210201145958.parquet 10000301345 NULL 20190505 001942775096 2 251942775095 1942775095 401345 A 222 223 02 301346 NULL NULL NULL NULL NULL 25 1612162791775 10000301345/001942775096/2 20190909 Time taken: 17.764 seconds, Fetched: 1 row(s) ``` But when I **set hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat** , I encountered the following error. ```hive> set hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat; hive> select * from hudi_imp_par_mor_local_x1 where serial_no = '10000301345'; Query ID = root_20210201151617_b6b7ee22-cc7a-4e22-b318-2e952d74e8dc Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator 21/02/01 15:16:17 INFO client.RMProxy: Connecting to ResourceManager at node103/10.20.29.103:8032 21/02/01 15:16:17 INFO client.RMProxy: Connecting to ResourceManager at node103/10.20.29.103:8032 21/02/01 15:16:18 INFO utils.HoodieInputFormatUtils: Reading hoodie metadata from path hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1 21/02/01 15:16:18 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1 21/02/01 15:16:18 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://nameservice1], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@78d6692f, file:/etc/hive/conf.cloudera.hive/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_815164917_1, ugi=root (auth:SIMPLE)]]] 21/02/01 15:16:18 INFO table.HoodieTableConfig: Loading table properties from hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1/.hoodie/hoodie.properties 21/02/01 15:16:18 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1 21/02/01 15:16:18 INFO hadoop.HoodieParquetInputFormat: Found a total of 1 groups 21/02/01 15:16:18 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210201145958__commit__COMPLETED], [20210201150644__commit__COMPLETED]] 21/02/01 15:16:18 INFO view.HoodieTableFileSystemView: Adding file-groups for partition :20190909, #FileGroups=1 21/02/01 15:16:18 INFO view.HoodieTableFileSystemView: Adding file-groups for partition :20180909, #FileGroups=2 21/02/01 15:16:18 INFO view.HoodieTableFileSystemView: Adding file-groups for partition :20181230, #FileGroups=1 21/02/01 15:16:18 INFO view.AbstractTableFileSystemView: addFilesToView: NumFiles=8, NumFileGroups=4, FileGroupsCreationTime=10, StoreTimeTaken=4 21/02/01 15:16:18 INFO utils.HoodieInputFormatUtils: Total paths to process after hoodie filter 4 Starting Job = job_1611822796186_0067, Tracking URL = http://node103:8088/proxy/application_1611822796186_0067/ Kill Command = /opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/hadoop job -kill job_1611822796186_0067 Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 0 2021-02-01 15:16:25,401 Stage-1 map = 0%, reduce = 0% 2021-02-01 15:16:50,198 Stage-1 map = 100%, reduce = 0% Ended Job = job_1611822796186_0067 with errors Error during job, obtaining debugging information... Examining task ID: task_1611822796186_0067_m_000003 (and more) from job job_1611822796186_0067 Examining task ID: task_1611822796186_0067_m_000002 (and more) from job job_1611822796186_0067 Task with the most failures(4): ----- Task ID: task_1611822796186_0067_m_000000 URL: http://node103:8088/taskdetails.jsp?jobid=job_1611822796186_0067&tipid=task_1611822796186_0067_m_000000 ----- Diagnostic Messages for this Task: Error: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:101) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:447) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1109) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:477) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160) ... 8 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 4 HDFS Read: 0 HDFS Write: 0 HDFS EC Read: 0 FAIL Total MapReduce CPU Time Spent: 0 msec ``` **Environment Description** * Hudi version :0.6.0 * Spark version :2.4.0+cdh6.2.1 * Hive version :2.1.1+cdh6.2.1 * Hadoop version :3.0.0+cdh6.2.1 * Storage (HDFS/S3/GCS..) :HDFS * Running on Docker? (yes/no) :no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org