[jira] [Commented] (HUDI-8291) Add support to give priority to ingestion writer when there is a conflicting pending clustering

sivabalan narayanan (Jira) Wed, 02 Oct 2024 15:55:25 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886532#comment-17886532
 ]


sivabalan narayanan commented on HUDI-8291:
-------------------------------------------

h2. Design
h3. Design considerations:
We have two approaches to take. either we introduce a new state to the timeline 
or we avoid making any changes to timeline, but take a hit on ensuring all 
readers of pending clustering plans account for concurrent deletion of the plan 
or deser eagerly within the lock which might bring in some complexity.
 
We can dive into the specifics below
h3. Design Option1 : *Introducing "abort" state to actions in timeline* 
As of now, an action typically starts with "requested" state and then moves to 
"inflight" and eventually goes to "complete". If not, a rollback will be 
triggered in which the commit files are deleted from timeline. We are 
introducing a new state called "abort". With this, an action can actually move 
from
                                                     ➝ "complete"
"requested" ➝ "inflight" ➝
                                                     ➝ "abort"
 
Abort means that the action is aborted or cancelled.
 
Cancellable clustering plan requests will be added to
 
".hoodie/cancel_requests/" folder within ".hoodie" by any ingestion writer.
 
format:
[\{clustering_instant}_\{writer_commit_time}.cancel]
h3. Writer:
whenever it detects a conflict w/ clustering and wants to proceed ahead,
will check if there are any pending requests for cancellations.
 
{code:java}
if not {
acquires lock
Checks if there are no pending requests for cancellations and clustering is not 
yet complete
adds a cancellation request to ".hoodie/clustering_cancel_requests/"
else if there is a pending clustering,
return
else if clustering is already complete:
return.
releases lock
} {code}
 
 
In all cases, the writer assumes that it can proceed w/o any hiccups.
Todo:
If clustering completes, should the writer bail out?
 
Eventually during conflict resolution step:
{code:java}
if there is a conflict w/ an on-going clustering,
and if there is a cancellation request
continue and complete the commit.
else if the clustering moved to abort state.
continue and complete the commit.
else if clustering is completed or is in progress w/o any request for 
cancellations,
current writer should abort itself. {code}
 
h4. Clustering execution runnable:
just before wrapping up the clustering (we could optimize this down the line, 
to do it eagerly at different points in time. For now, lets say we do this 
check at the end)
 
{code:java}
takes a lock
checks for any cancellation request.
if yes,
rollsback the clustering.
and moves the state to abort state.
else,
continues as usual. does conflict resolution as usual. bcoz, if the writer does 
not want to cancel the clustering, there could be data loss. hence the conflict 
resolution step should not change for clustering writer. {code}
 
h4. Consideration:
Instead of adding the cancel requests to a adhoc folder in ".hoodie", we could 
also introduce a potential action in the timeline.
 
t5.rc.requested
t5.rc.inflight
t5.rc.abort.requested
t5.rc.abort
 
Up until now, for the past 8+ years, we were able to confine to just 3 states 
(requested, inflight, complete). But this one introduces a 4th state. So, we 
need to really think about if we wanted to go this route.
Above is just an alternative to "conveying the cancellation request" to the 
cluster worker by other ingestion writer. Irrespective of this, the final abort 
state is required.
 
h4. Fixes to other entities accessing pending clustering.
 
lets try to understand the usages of the pending clustering.
1. Delete partition: When planning for delete partition, ignores file groups to 
be replaced that already has a pending clustering. 2. Future clustering 
planning will ignore file groups from pending clustering plans. 2.a Consistent 
Bucket index. Will ignore file groups that are part of already pending 
clustering. 3. Future compaction planning will ignore file groups from pending 
clustering plans. 4. CommitActionExecutor: updateStrategy.handleUpdate() to 
resolve conflicts with pending clustering for incoming writes. We should be 
able to deprecate this once we have conurrent update support with pending 
clustering. 5. Upsert partitioner. To avoid adding small file handling to file 
groups from pending clustering. 6. Guess w/ consistent bucket index, incoming 
records will be added to both replaced and new file groups from pending 
clustering and hence pending clustering is required there.
 
In most cases, callers wanted to ignore file groups from a pending clustering. 
So, eventually if its deleted, should not be an issue.
Delete partition and consistency bucket index might need some fixes to redo or 
replan itself again if it detects a pending clustering is eventually deleted or 
up for cancellation.
 
h4. Example illustration of the above design
Lets walk through a simple scenario and see how it might pan out.
 
[t5.rc.re|http://t5.rc.re/]
t5.rc.infli
crashed.
 
restart :
we detect that there is a cancellation request and hence trigger a rollback.
 
t6.rb.req
t6.rb.inf
t6.rb.complete
Note: t6 fully rollsback t5's prev attempt.
 
And then proceeds on to move the action to abort.
 
t5.rc.requested
t5.rc.inflight (empty file)
t5.rc.aborted (empty)
 
 
h2. Design choice 2:
All we are trying to achieve w/ approach 1 is to 1) intimate clustering worker 
that a particular clustering has to be cancelled. And 2) the abort state is 
introduced so that any other concurrent reader may not run into any deser 
issue. If not, we could go ahead and delete the clustering plan. In this 
design, we try to take a stab to see if we can avoid both.
 
1) should be doable by adding some additional metadata to commits. Every writer 
will be given some priority. for simplicity say, ingestion writer gets priority 
of 1 and clustering gets 2. Ingestion writer will not worry about any conflicts 
wrt any pending clustering and will proceed onto complete its commits.
On the other hand, clustering worker during the conflict resolution step, will 
find out all conflicting commits, and find for high priority writers (commit 
metadata). Based on that it will cancel itself.
 
2) is what might be tricky.
Before we go further, lets try to understand the usages of the pending 
clustering.
Please refer to above section where I have compiled all users/callers of 
pending clustering.
 
So, its gonna be tough to fix all callers so that they account for the scenario 
that a pending clustering could be deleted or revoked eventually.
 
To the bare minimum, we should account for scenario, where timeline contains a 
pending replace commit, but when a caller is about to deser replace commit 
metadata, it was deleted or rolled back by the concurrent clustering worker.
 
If at all we really wanted to achieve this, we need every caller to take locks, 
refresh timeline, and deser replace commit metadata eagerly and release the 
lock. so, that while the caller is within the lock, deser replace commit may 
not fail at all. But post releasing the lock, a concurrent clustering worker 
could delete the replace commit, and the caller will have to account for it.
 
h2. Conclusion:
I feel approach 1 seems elegant and does not involve lot of complexities. But 
it does introduce a new state to the timeline. Will brainstorm w/ the team 
further.

> Add support to give priority to ingestion writer when there is a conflicting 
> pending clustering
> -----------------------------------------------------------------------------------------------
>
>                 Key: HUDI-8291
>                 URL: https://issues.apache.org/jira/browse/HUDI-8291
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: sivabalan narayanan
>            Priority: Major
>
> As of now, when ingestion writer conflicts with pending clustering, the 
> ingestion write is aborted, while the clustering eventually succeeds. But 
> many times, ingestion writer might have stronger SLAs and might want to make 
> progress over clustering, since clustering is considered to be more of an 
> optimization. 
>  
> So, would be good to add this support to Hudi 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-8291) Add support to give priority to ingestion writer when there is a conflicting pending clustering

Reply via email to