nonggia.liang created HUDI-4769:
-----------------------------------

             Summary: Option read.streaming.skip_compaction skips delta commit
                 Key: HUDI-4769
                 URL: https://issues.apache.org/jira/browse/HUDI-4769
             Project: Apache Hudi
          Issue Type: Bug
          Components: flink, flink-sql
            Reporter: nonggia.liang


Option read.streaming.skip_compaction was introduced to avoid consuming 
duplicate data from delta-commits and compactions in MOR table.

But the option may cause delta-commits, here the case:

Support we have a timeline (d for delta-commit, C for compaction/commit):

d1 --> d2 --> C3 --> d3 --> d4 -->

t1.......................................................t2..........

Let's say scans for streaming read happen at time t1 and t2, when d1 and d4 is 
the latest instant seperately. 

When we scan at t2 with read.streaming.skip_compaction=true, we get a latest 
merged fileslice with only log files containing d3+d4.  So d2 is skipped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to