GintokiYs opened a new issue #2513:
URL: https://github.com/apache/hudi/issues/2513


   **Describe the problem you faced**
   When I insert data through Hudi-Spark and synchronize the data to Hive, I 
can use Hive-Cli query this cow table and get the data 
(hudi-hadoop-mr-bundle-0.6.0 has been placed under ${HIVE_HOME}/lib).
   
   ```hive> select * from hudi_imp_par_mor_local_x1 where serial_no = 
'10000301345';
   Query ID = root_20210201150400_fbb4e52b-c41d-4d6b-b1b8-4678a6642f2d
   Total jobs = 1
   Launching Job 1 out of 1
   Number of reduce tasks is set to 0 since there's no reduce operator
   21/02/01 15:04:02 INFO client.RMProxy: Connecting to ResourceManager at 
node103/10.20.29.103:8032
   21/02/01 15:04:02 INFO client.RMProxy: Connecting to ResourceManager at 
node103/10.20.29.103:8032
   Starting Job = job_1611822796186_0064, Tracking URL = 
http://node103:8088/proxy/application_1611822796186_0064/
   Kill Command = 
/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/hadoop job 
 -kill job_1611822796186_0064
   Hadoop job information for Stage-1: number of mappers: 1; number of 
reducers: 0
   2021-02-01 15:04:10,216 Stage-1 map = 0%,  reduce = 0%
   2021-02-01 15:04:17,510 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 
6.01 sec
   MapReduce Total cumulative CPU time: 6 seconds 10 msec
   Ended Job = job_1611822796186_0064
   MapReduce Jobs Launched:
   Stage-Stage-1: Map: 1   Cumulative CPU: 6.01 sec   HDFS Read: 10327962 HDFS 
Write: 397 HDFS EC Read: 0 SUCCESS
   Total MapReduce CPU Time Spent: 6 seconds 10 msec
   OK
   20210201145958  20210201145958_0_9      10000301345/001942775096/2      
20190909        
e3332789-77e5-4e6b-a0cd-24e87814c572-0_0-6-8_20210201145958.parquet     
10000301345     NULL    20190505        001942775096   2       251942775095    
1942775095      401345  A       222     223     02      301346  NULL    NULL    
NULL    NULL    NULL    25      1612162791775   10000301345/001942775096/2      
20190909
   Time taken: 17.764 seconds, Fetched: 1 row(s)
   ```
   But when I **set 
hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat** , I 
encountered the following error.
   ```hive> set 
hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat;
   hive> select * from hudi_imp_par_mor_local_x1 where serial_no = 
'10000301345';
   Query ID = root_20210201151617_b6b7ee22-cc7a-4e22-b318-2e952d74e8dc
   Total jobs = 1
   Launching Job 1 out of 1
   Number of reduce tasks is set to 0 since there's no reduce operator
   21/02/01 15:16:17 INFO client.RMProxy: Connecting to ResourceManager at 
node103/10.20.29.103:8032
   21/02/01 15:16:17 INFO client.RMProxy: Connecting to ResourceManager at 
node103/10.20.29.103:8032
   21/02/01 15:16:18 INFO utils.HoodieInputFormatUtils: Reading hoodie metadata 
from path hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1
   21/02/01 15:16:18 INFO table.HoodieTableMetaClient: Loading 
HoodieTableMetaClient from 
hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1
   21/02/01 15:16:18 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: 
[hdfs://nameservice1], Config:[Configuration: core-default.xml, core-site.xml, 
mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, 
hdfs-default.xml, hdfs-site.xml, 
org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@78d6692f, 
file:/etc/hive/conf.cloudera.hive/hive-site.xml], FileSystem: 
[DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_815164917_1, ugi=root 
(auth:SIMPLE)]]]
   21/02/01 15:16:18 INFO table.HoodieTableConfig: Loading table properties 
from 
hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1/.hoodie/hoodie.properties
   21/02/01 15:16:18 INFO table.HoodieTableMetaClient: Finished Loading Table 
of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from 
hdfs://nameservice1/tmp/hudi/db1/hudi_imp_par_mor_local_x1
   21/02/01 15:16:18 INFO hadoop.HoodieParquetInputFormat: Found a total of 1 
groups
   21/02/01 15:16:18 INFO timeline.HoodieActiveTimeline: Loaded instants 
[[20210201145958__commit__COMPLETED], [20210201150644__commit__COMPLETED]]
   21/02/01 15:16:18 INFO view.HoodieTableFileSystemView: Adding file-groups 
for partition :20190909, #FileGroups=1
   21/02/01 15:16:18 INFO view.HoodieTableFileSystemView: Adding file-groups 
for partition :20180909, #FileGroups=2
   21/02/01 15:16:18 INFO view.HoodieTableFileSystemView: Adding file-groups 
for partition :20181230, #FileGroups=1
   21/02/01 15:16:18 INFO view.AbstractTableFileSystemView: addFilesToView: 
NumFiles=8, NumFileGroups=4, FileGroupsCreationTime=10, StoreTimeTaken=4
   21/02/01 15:16:18 INFO utils.HoodieInputFormatUtils: Total paths to process 
after hoodie filter 4
   Starting Job = job_1611822796186_0067, Tracking URL = 
http://node103:8088/proxy/application_1611822796186_0067/
   Kill Command = 
/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/hadoop job 
 -kill job_1611822796186_0067
   Hadoop job information for Stage-1: number of mappers: 4; number of 
reducers: 0
   2021-02-01 15:16:25,401 Stage-1 map = 0%,  reduce = 0%
   2021-02-01 15:16:50,198 Stage-1 map = 100%,  reduce = 0%
   Ended Job = job_1611822796186_0067 with errors
   Error during job, obtaining debugging information...
   Examining task ID: task_1611822796186_0067_m_000003 (and more) from job 
job_1611822796186_0067
   Examining task ID: task_1611822796186_0067_m_000002 (and more) from job 
job_1611822796186_0067
   
   Task with the most failures(4):
   -----
   Task ID:
     task_1611822796186_0067_m_000000
   
   URL:
     
http://node103:8088/taskdetails.jsp?jobid=job_1611822796186_0067&tipid=task_1611822796186_0067_m_000000
   -----
   Diagnostic Messages for this Task:
   Error: java.lang.RuntimeException: java.lang.NullPointerException
           at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
           at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
           at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
           at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
           at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
           at java.security.AccessController.doPrivileged(Native Method)
           at javax.security.auth.Subject.doAs(Subject.java:422)
           at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
           at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
   Caused by: java.lang.NullPointerException
           at 
org.apache.hadoop.hive.ql.exec.AbstractMapOperator.getNominalPath(AbstractMapOperator.java:101)
           at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:447)
           at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1109)
           at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:477)
           at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
           ... 8 more
   
   
   FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
   MapReduce Jobs Launched:
   Stage-Stage-1: Map: 4   HDFS Read: 0 HDFS Write: 0 HDFS EC Read: 0 FAIL
   Total MapReduce CPU Time Spent: 0 msec
   ```
   
   **Environment Description**
   
   * Hudi version :0.6.0
   
   * Spark version :2.4.0+cdh6.2.1
   
   * Hive version :2.1.1+cdh6.2.1
   
   * Hadoop version :3.0.0+cdh6.2.1
   
   * Storage (HDFS/S3/GCS..) :HDFS
   
   * Running on Docker? (yes/no) :no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to