[
https://issues.apache.org/jira/browse/HUDI-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886532#comment-17886532
]
sivabalan narayanan commented on HUDI-8291:
-------------------------------------------
h2. Design
h3. Design considerations:
We have two approaches to take. either we introduce a new state to the timeline
or we avoid making any changes to timeline, but take a hit on ensuring all
readers of pending clustering plans account for concurrent deletion of the plan
or deser eagerly within the lock which might bring in some complexity.
We can dive into the specifics below
h3. Design Option1 : *Introducing "abort" state to actions in timeline*
As of now, an action typically starts with "requested" state and then moves to
"inflight" and eventually goes to "complete". If not, a rollback will be
triggered in which the commit files are deleted from timeline. We are
introducing a new state called "abort". With this, an action can actually move
from
➝ "complete"
"requested" ➝ "inflight" ➝
➝ "abort"
Abort means that the action is aborted or cancelled.
Cancellable clustering plan requests will be added to
".hoodie/cancel_requests/" folder within ".hoodie" by any ingestion writer.
format:
[\{clustering_instant}_\{writer_commit_time}.cancel]
h3. Writer:
whenever it detects a conflict w/ clustering and wants to proceed ahead,
will check if there are any pending requests for cancellations.
{code:java}
if not {
acquires lock
Checks if there are no pending requests for cancellations and clustering is not
yet complete
adds a cancellation request to ".hoodie/clustering_cancel_requests/"
else if there is a pending clustering,
return
else if clustering is already complete:
return.
releases lock
} {code}
In all cases, the writer assumes that it can proceed w/o any hiccups.
Todo:
If clustering completes, should the writer bail out?
Eventually during conflict resolution step:
{code:java}
if there is a conflict w/ an on-going clustering,
and if there is a cancellation request
continue and complete the commit.
else if the clustering moved to abort state.
continue and complete the commit.
else if clustering is completed or is in progress w/o any request for
cancellations,
current writer should abort itself. {code}
h4. Clustering execution runnable:
just before wrapping up the clustering (we could optimize this down the line,
to do it eagerly at different points in time. For now, lets say we do this
check at the end)
{code:java}
takes a lock
checks for any cancellation request.
if yes,
rollsback the clustering.
and moves the state to abort state.
else,
continues as usual. does conflict resolution as usual. bcoz, if the writer does
not want to cancel the clustering, there could be data loss. hence the conflict
resolution step should not change for clustering writer. {code}
h4. Consideration:
Instead of adding the cancel requests to a adhoc folder in ".hoodie", we could
also introduce a potential action in the timeline.
t5.rc.requested
t5.rc.inflight
t5.rc.abort.requested
t5.rc.abort
Up until now, for the past 8+ years, we were able to confine to just 3 states
(requested, inflight, complete). But this one introduces a 4th state. So, we
need to really think about if we wanted to go this route.
Above is just an alternative to "conveying the cancellation request" to the
cluster worker by other ingestion writer. Irrespective of this, the final abort
state is required.
h4. Fixes to other entities accessing pending clustering.
lets try to understand the usages of the pending clustering.
1. Delete partition: When planning for delete partition, ignores file groups to
be replaced that already has a pending clustering. 2. Future clustering
planning will ignore file groups from pending clustering plans. 2.a Consistent
Bucket index. Will ignore file groups that are part of already pending
clustering. 3. Future compaction planning will ignore file groups from pending
clustering plans. 4. CommitActionExecutor: updateStrategy.handleUpdate() to
resolve conflicts with pending clustering for incoming writes. We should be
able to deprecate this once we have conurrent update support with pending
clustering. 5. Upsert partitioner. To avoid adding small file handling to file
groups from pending clustering. 6. Guess w/ consistent bucket index, incoming
records will be added to both replaced and new file groups from pending
clustering and hence pending clustering is required there.
In most cases, callers wanted to ignore file groups from a pending clustering.
So, eventually if its deleted, should not be an issue.
Delete partition and consistency bucket index might need some fixes to redo or
replan itself again if it detects a pending clustering is eventually deleted or
up for cancellation.
h4. Example illustration of the above design
Lets walk through a simple scenario and see how it might pan out.
[t5.rc.re|http://t5.rc.re/]
t5.rc.infli
crashed.
restart :
we detect that there is a cancellation request and hence trigger a rollback.
t6.rb.req
t6.rb.inf
t6.rb.complete
Note: t6 fully rollsback t5's prev attempt.
And then proceeds on to move the action to abort.
t5.rc.requested
t5.rc.inflight (empty file)
t5.rc.aborted (empty)
h2. Design choice 2:
All we are trying to achieve w/ approach 1 is to 1) intimate clustering worker
that a particular clustering has to be cancelled. And 2) the abort state is
introduced so that any other concurrent reader may not run into any deser
issue. If not, we could go ahead and delete the clustering plan. In this
design, we try to take a stab to see if we can avoid both.
1) should be doable by adding some additional metadata to commits. Every writer
will be given some priority. for simplicity say, ingestion writer gets priority
of 1 and clustering gets 2. Ingestion writer will not worry about any conflicts
wrt any pending clustering and will proceed onto complete its commits.
On the other hand, clustering worker during the conflict resolution step, will
find out all conflicting commits, and find for high priority writers (commit
metadata). Based on that it will cancel itself.
2) is what might be tricky.
Before we go further, lets try to understand the usages of the pending
clustering.
Please refer to above section where I have compiled all users/callers of
pending clustering.
So, its gonna be tough to fix all callers so that they account for the scenario
that a pending clustering could be deleted or revoked eventually.
To the bare minimum, we should account for scenario, where timeline contains a
pending replace commit, but when a caller is about to deser replace commit
metadata, it was deleted or rolled back by the concurrent clustering worker.
If at all we really wanted to achieve this, we need every caller to take locks,
refresh timeline, and deser replace commit metadata eagerly and release the
lock. so, that while the caller is within the lock, deser replace commit may
not fail at all. But post releasing the lock, a concurrent clustering worker
could delete the replace commit, and the caller will have to account for it.
h2. Conclusion:
I feel approach 1 seems elegant and does not involve lot of complexities. But
it does introduce a new state to the timeline. Will brainstorm w/ the team
further.
> Add support to give priority to ingestion writer when there is a conflicting
> pending clustering
> -----------------------------------------------------------------------------------------------
>
> Key: HUDI-8291
> URL: https://issues.apache.org/jira/browse/HUDI-8291
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: sivabalan narayanan
> Priority: Major
>
> As of now, when ingestion writer conflicts with pending clustering, the
> ingestion write is aborted, while the clustering eventually succeeds. But
> many times, ingestion writer might have stronger SLAs and might want to make
> progress over clustering, since clustering is considered to be more of an
> optimization.
>
> So, would be good to add this support to Hudi
--
This message was sent by Atlassian Jira
(v8.20.10#820010)