[
https://issues.apache.org/jira/browse/HUDI-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
satish updated HUDI-687:
------------------------
Status: Open (was: New)
> incremental reads on MOR tables using RO view can lead to missing updates
> -------------------------------------------------------------------------
>
> Key: HUDI-687
> URL: https://issues.apache.org/jira/browse/HUDI-687
> Project: Apache Hudi (incubating)
> Issue Type: Improvement
> Reporter: satish
> Assignee: satish
> Priority: Critical
> Labels: pull-request-available
> Fix For: 0.6.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using
> HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss'
> (Data will still be on disk, but incremental readers wont see it because its
> in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to
> commits > t1. After compaction is complete, incremental reader would see
> updates in t2, t3, so on.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)