[ 
https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-1045:
----------------------------
    Sprint: Sprint 2024-04-26, 2024/06/17-30, 2024/06/03-16  (was: Sprint 
2024-04-26, 2024/06/03-16)

> Support updates during clustering
> ---------------------------------
>
>                 Key: HUDI-1045
>                 URL: https://issues.apache.org/jira/browse/HUDI-1045
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: clustering, table-service
>            Reporter: leesf
>            Assignee: Vinoth Chandar
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> h4. We need to allow a writer w writing to file groups f1, f2, f3, 
> concurrently while a clustering service C  reclusters them into  f4, f5. 
> Goals
>  * Writes can be either updates, deletes or inserts. 
>  * Either clustering C or the writer W can finish first
>  * Both W and C need to be able to complete their actions without much 
> redoing of work. 
>  * The number of output file groups for C can be higher or lower than input 
> file groups. 
>  * Need to work across and be oblivious to whether the writers are operating 
> in OCC or NBCC modes
>  * Needs to interplay well with cleaning and compaction services.
> h4. Non-goals 
>  * Strictly the sort order achieved by clustering, in face of updates (e.g 
> updates change clustering field values, causing output clustering file groups 
> to be not fully sorted by those fields)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to