[GitHub] [hudi] GintokiYs commented on issue #2513: [SUPPORT]Hive-Cli set hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat and query error

GitBox Tue, 02 Feb 2021 00:15:57 -0800


GintokiYs commented on issue #2513:
URL: https://github.com/apache/hudi/issues/2513#issuecomment-771453564



   @n3nash Thank you for your reply. 
   When I update the data in the Hudi table, the Hive-Cli query will get two 
records (the two records have the same primary key), while the Spark-SQL query 
is normal (only one record).
   I want to know how to solve the problem of historical data in Hive-cli query.
   The following figure is the result of hive-cli query, where 
(10000301345/001942775096/2) is one of my composite primary keys.
   ```
   hive> select * from hudi_imp_par_mor_local_x1 where serial_no = 
'10000301345';
   Query ID = root_20210202160414_00dbbdc9-5d2a-490a-b5ba-dcdccf2c8c1b
   Total jobs = 1
   Launching Job 1 out of 1
   Number of reduce tasks is set to 0 since there's no reduce operator
   21/02/02 16:04:14 INFO client.RMProxy: Connecting to ResourceManager at 
node103/10.20.29.103:8032
   21/02/02 16:04:14 INFO client.RMProxy: Connecting to ResourceManager at 
node103/10.20.29.103:8032
   Starting Job = job_1611822796186_0114, Tracking URL = 
http://node103:8088/proxy/application_1611822796186_0114/
   Kill Command = 
/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/hadoop job 
 -kill job_1611822796186_0114
   Hadoop job information for Stage-1: number of mappers: 1; number of 
reducers: 0
   2021-02-02 16:04:22,116 Stage-1 map = 0%,  reduce = 0%
   2021-02-02 16:04:30,370 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 
6.91 sec
   MapReduce Total cumulative CPU time: 6 seconds 910 msec
   Ended Job = job_1611822796186_0114
   MapReduce Jobs Launched:
   Stage-Stage-1: Map: 1   Cumulative CPU: 6.91 sec   HDFS Read: 20638550 HDFS 
Write: 711 HDFS EC Read: 0 SUCCESS
   Total MapReduce CPU Time Spent: 6 seconds 910 msec
   OK
   20210201150644  20210201150644_0_178    10000301345/001942775096/2      
20190909        
e3332789-77e5-4e6b-a0cd-24e87814c572-0_0-22-53_20210201150644.parquet   
10000301345     NULL    20190505        001942775096     2       251942775095   
 1942775095      401345  D       222     223     02      301346  NULL    NULL   
 NULL    NULL    NULL    25      1612163195163   10000301345/001942775096/2     
 20190909
   20210201145958  20210201145958_0_9      10000301345/001942775096/2      
20190909        
e3332789-77e5-4e6b-a0cd-24e87814c572-0_0-6-8_20210201145958.parquet     
10000301345     NULL    20190505        001942775096     2       251942775095   
 1942775095      401345  A       222     223     02      301346  NULL    NULL   
 NULL    NULL    NULL    25      1612162791775   10000301345/001942775096/2     
 20190909
   Time taken: 17.288 seconds, Fetched: 2 row(s)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] GintokiYs commented on issue #2513: [SUPPORT]Hive-Cli set hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat and query error

Reply via email to