GintokiYs opened a new issue #2513:
URL: https://github.com/apache/hudi/issues/2513
**Describe the problem you faced**
When I insert data through Hudi-Spark and synchronize the data to Hive, I
can use Hive-Cli query this cow table and get the data
(hudi-hadoop-mr-bundle-0.6.0 has been placed under ${HIVE_HOME}/lib)》
```hive> select * from hudi_imp_par_mor_local_x1 where serial_no =
'10000301345';
Query ID = root_20210201150400_fbb4e52b-c41d-4d6b-b1b8-4678a6642f2d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
21/02/01 15:04:02 INFO client.RMProxy: Connecting to ResourceManager at
node103/10.20.29.103:8032
21/02/01 15:04:02 INFO client.RMProxy: Connecting to ResourceManager at
node103/10.20.29.103:8032
Starting Job = job_1611822796186_0064, Tracking URL =
http://node103:8088/proxy/application_1611822796186_0064/
Kill Command =
/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/hadoop job
-kill job_1611822796186_0064
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 0
2021-02-01 15:04:10,216 Stage-1 map = 0%, reduce = 0%
2021-02-01 15:04:17,510 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
6.01 sec
MapReduce Total cumulative CPU time: 6 seconds 10 msec
Ended Job = job_1611822796186_0064
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 6.01 sec HDFS Read: 10327962 HDFS
Write: 397 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 10 msec
OK
20210201145958 20210201145958_0_9 10000301345/001942775096/2
20190909
e3332789-77e5-4e6b-a0cd-24e87814c572-0_0-6-8_20210201145958.parquet
10000301345 NULL 20190505 001942775096 2 251942775095
1942775095 401345 A 222 223 02 301346 NULL NULL
NULL NULL NULL 25 1612162791775 10000301345/001942775096/2
20190909
Time taken: 17.764 seconds, Fetched: 1 row(s)
```
But when I **set
hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat** , I
encountered the following error.
```hive> set
hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat;
hive> select * from hudi_imp_par_mor_local_x1 where serial_no =
'10000301345';
Query ID = root_20210201151617_b6b7ee22-cc7a-4e22-b318-2e952d74e8dc
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
21/02/01 15:16:17 INFO client.RMProxy: Connecting to ResourceManager at
node103/10.20.29.103:8032
21/02/01 15:16:17 INFO client.RMProxy: Connecting to ResourceManager at
node103/10.20.29.103:8032
21/02/01 15:16:18 INFO utils.HoodieInputFormatUtils: Reading hoodie metadata
from path hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1
21/02/01 15:16:18 INFO table.HoodieTableMetaClient: Loading
HoodieTableMetaClient from
hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1
21/02/01 15:16:18 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS:
[hdfs://nameservice1], Config:[Configuration: core-default.xml, core-site.xml,
mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml,
hdfs-default.xml, hdfs-site.xml,
org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@78d6692f,
file:/etc/hive/conf.cloudera.hive/hive-site.xml], FileSystem:
[DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_815164917_1, ugi=root
(auth:SIMPLE)]]]
21/02/01 15:16:18 INFO table.HoodieTableConfig: Loading table properties
from
hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1/.hoodie/hoodie.properties
21/02/01 15:16:18 INFO table.HoodieTableMetaClient: Finished Loading Table
of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1
21/02/01 15:16:18 INFO hadoop.HoodieParquetInputFormat: Found a total of 1
groups
21/02/01 15:16:18 INFO timeline.HoodieActiveTimeline: Loaded instants
[[20210201145958__commit__COMPLETED], [20210201150644__commit__COMPLETED]]
21/02/01 15:16:18 INFO view.HoodieTableFileSystemView: Adding file-groups
for partition :20190909, #FileGroups=1
21/02/01 15:16:18 INFO view.HoodieTableFileSystemView: Adding file-groups
for partition :20180909, #FileGroups=2
21/02/01 15:16:18 INFO view.HoodieTableFileSystemView: Adding file-groups
for partition :20181230, #FileGroups=1
21/02/01 15:16:18 INFO view.AbstractTableFileSystemView: addFilesToView:
NumFiles=8, NumFileGroups=4, FileGroupsCreationTime=10, StoreTimeTaken=4
21/02/01 15:16:18 INFO utils.HoodieInputFormatUtils: Total paths to process
after hoodie filter 4
Starting Job = job_1611822796186_0067, Tracking URL =
http://node103:8088/proxy/application_1611822796186_0067/
Kill Command =
/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/hadoop job
-kill job_1611822796186_0067
Hadoop job information for Stage-1: number of mappers: 4; number of
reducers: 0
2021-02-01 15:16:25,401 Stage-1 map = 0%, reduce = 0%
2021-02-01 15:16:50,198 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1611822796186_0067 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1611822796186_0067_m_000003 (and more) from job
job_1611822796186_0067
Examining task ID: task_1611822796186_0067_m_000002 (and more) from job
job_1611822796186_0067
Task with the most failures(4):
-----
Task ID:
task_1611822796186_0067_m_000000
URL:
http://node103:8088/taskdetails.jsp?jobid=job_1611822796186_0067&tipid=task_1611822796186_0067_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:101)
at
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:447)
at
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1109)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:477)
at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 4 HDFS Read: 0 HDFS Write: 0 HDFS EC Read: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
```
**Environment Description**
* Hudi version :0.6.0
* Spark version :2.4.0+cdh6.2.1
* Hive version :2.1.1+cdh6.2.1
* Hadoop version :3.0.0+cdh6.2.1
* Storage (HDFS/S3/GCS..) :HDFS
* Running on Docker? (yes/no) :no
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]