GintokiYs commented on issue #2513: URL: https://github.com/apache/hudi/issues/2513#issuecomment-771453564
@n3nash Thank you for your reply. When I update the data in the Hudi table, the Hive-Cli query will get two records (the two records have the same primary key), while the Spark-SQL query is normal (only one record). I want to know how to solve the problem of historical data in Hive-cli query. The following figure is the result of hive-cli query, where (10000301345/001942775096/2) is one of my composite primary keys. ``` hive> select * from hudi_imp_par_mor_local_x1 where serial_no = '10000301345'; Query ID = root_20210202160414_00dbbdc9-5d2a-490a-b5ba-dcdccf2c8c1b Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator 21/02/02 16:04:14 INFO client.RMProxy: Connecting to ResourceManager at node103/10.20.29.103:8032 21/02/02 16:04:14 INFO client.RMProxy: Connecting to ResourceManager at node103/10.20.29.103:8032 Starting Job = job_1611822796186_0114, Tracking URL = http://node103:8088/proxy/application_1611822796186_0114/ Kill Command = /opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/hadoop/bin/hadoop job -kill job_1611822796186_0114 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2021-02-02 16:04:22,116 Stage-1 map = 0%, reduce = 0% 2021-02-02 16:04:30,370 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.91 sec MapReduce Total cumulative CPU time: 6 seconds 910 msec Ended Job = job_1611822796186_0114 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 6.91 sec HDFS Read: 20638550 HDFS Write: 711 HDFS EC Read: 0 SUCCESS Total MapReduce CPU Time Spent: 6 seconds 910 msec OK 20210201150644 20210201150644_0_178 10000301345/001942775096/2 20190909 e3332789-77e5-4e6b-a0cd-24e87814c572-0_0-22-53_20210201150644.parquet 10000301345 NULL 20190505 001942775096 2 251942775095 1942775095 401345 D 222 223 02 301346 NULL NULL NULL NULL NULL 25 1612163195163 10000301345/001942775096/2 20190909 20210201145958 20210201145958_0_9 10000301345/001942775096/2 20190909 e3332789-77e5-4e6b-a0cd-24e87814c572-0_0-6-8_20210201145958.parquet 10000301345 NULL 20190505 001942775096 2 251942775095 1942775095 401345 A 222 223 02 301346 NULL NULL NULL NULL NULL 25 1612162791775 10000301345/001942775096/2 20190909 Time taken: 17.288 seconds, Fetched: 2 row(s) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
