ruofan created HUDI-5990:
----------------------------

             Summary: Incremental queries on MOR sometimes miss data
                 Key: HUDI-5990
                 URL: https://issues.apache.org/jira/browse/HUDI-5990
             Project: Apache Hudi
          Issue Type: Bug
          Components: spark-sql
    Affects Versions: 0.13.0, 0.12.2
            Reporter: ruofan
             Fix For: 0.14.0


env: hudi-0.12.2 spark-3.2.0

Currently,we have a hudi timeline and data files.
{code:java}
-rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095758155.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 20230326095758155.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 20230326095758155.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:58 20230326095810406.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095810406.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095810406.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095811072.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095811072.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095811072.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095820974.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095820974.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095820974.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 20230326095830980.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095830980.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095830980.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 20230326095840978.compaction.requested
-rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095841125.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095841125.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095841125.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:59 20230326095850994.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095850994.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095850994.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095900988.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095900988.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095900988.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095910983.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095910983.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095910983.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095920986.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095920986.deltacommit.requested
-rw-r--r--  1 rfyu rfyu 1.5K 3月  26 09:58 
.b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.1_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 
.b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.2_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 
.b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.3_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 
.b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.4_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 
.b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.5_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 
.b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.1_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 
.b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.2_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 
.b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.3_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 
.b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.4_0-1-0 {code}
We use spark to incrementally query this hudi table. Data maybe go missing due 
to the incremental range contains an incomplete compaction plan.

There is an example of incremental query.Normally, from begin_instance_time to 
end_instance_time, 6 commits should have been found, but only 3 were found.
{code:java}
sql:
call 
copy_to_table(table=>'hudi_table',new_table=>'incremental_table',query_type=>'incremental',begin_instance_time=>'20230326095810406',end_instance_time=>'20230326095900988');
select _hoodie_commit_time,count(*) from incremental_table group by 
_hoodie_commit_time order by _hoodie_commit_time desc;actual result: 
+-------------------+--------+
|_hoodie_commit_time|count(1)|
+-------------------+--------+
|20230326095830980  |10      |
|20230326095820974  |10      |
|20230326095811072  |10      |
+-------------------+--------+expected result:
+-------------------+--------+
|_hoodie_commit_time|count(1)|
+-------------------+--------+
|20230326095830980  |10      |
|20230326095820974  |10      |
|20230326095811072  |10      |
|20230326095841125  |10      |
|20230326095850994  |10      |
|20230326095900988  |10      | {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to