[ 
https://issues.apache.org/jira/browse/HUDI-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1325:
---------------------------------
    Status: In Progress  (was: Open)

> Implement in-memory merging of metadata table with the non-synced part of 
> data timeline
> ---------------------------------------------------------------------------------------
>
>                 Key: HUDI-1325
>                 URL: https://issues.apache.org/jira/browse/HUDI-1325
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Prashant Wason
>            Assignee: Ryan Pifer
>            Priority: Blocker
>
> Here is a corner case with syncing completed compaction from data timeline to 
> metadata timeline. Consider the following sequence of events
> t0: writer schedules compaction at time instant c
> t1: Compactor starts processing c's plan
> t2: compaction finishes with c.commit published on the data timeline (not yet 
> synced to metadata timeline)
> t3: Next round of writing, writer opens metadata table, which adds the base 
> file produced in c.commit to metadata table.
> Any queries running between t2 and t3, cannot rely on metadata since the new 
> base file will not be present in metadata table. The timeline will indicate 
> that the compaction completed, and the latest file slice will be computed as 
> simply the logs written to the file groups since compaction. This will lead 
> to incorrect results.
> If we consider just writer alone, we may be okay since we first sync the 
> metadata table before we do anything for the delta commit at t3. But in 
> general for queries, we should advise enabling metadata table based listings 
> only, after all writers/cleaner/compactor have been enabled to use metadata 
> and been successfully using it to publish new/deleted files directly to the 
> metadata table. In short, queries cannot rely on metadata table, with the 
> syncing mechanism as the main thing that keeps data and metadata timelines 
> together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to