Re: [PR] [RFC-79] Improving reliability of concurrent table service executions and rollbacks [hudi]

via GitHub Mon, 09 Sep 2024 15:55:23 -0700


kbuci commented on code in PR #11555:
URL: https://github.com/apache/hudi/pull/11555#discussion_r1751030953



##########
rfc/rfc-79/rfc-79-2.md:
##########
@@ -0,0 +1,99 @@
+w<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Add support for cancellable table service plans
+
+## Proposers
+
+
+## Approvers
+
+## Status
+
+JIRA: HUDI-7946
+
+
+## Abstract
+Table service plans can delay ingestion writes from updating a dataset with 
recent data if potential write conflicts are detected. Furthermore, a table 
service plan that isn't executed to completion for a large amount of time (due 
to repeated failures, application misconfiguration, or insufficient resources) 
will degrade the read/write performance of a dataset due to delaying clean, 
archival, and metadata table compaction. This is because currently HUDI table 
service plans, upon being scheduled, must be executed to completion. And 
additonally will prevent any ingestion write targeting the same files from 
succeeding (due to posing as a write conflict) as well as prevent any new table 
service plan from targeting the same files. Enabling a user to configure a 
table service plan as "cancellable" can prevent frequent or repeatedly failing 
table service plans from delaying ingestion. Support for cancellable plans will 
provide HUDI an avenue to fully cancel a table service plan and allow o
 ther table service to proceed.
+
+
+## Background
+### Execution of table services 
+The table service operations compact and cluster are by default "immutable" 
plans, meaning that once a plan is scheduled it will stay as as a pending 
instant until a caller invokes the table service execute API on the table 
service instant and sucessfully completes it. Specifically, if an inflight 
execution fails after transitioning the instant to inflight, the next execution 
attempt will implictly create and execute a rollback plan (which will delete 
all new instant/data files), but will keep the table service plan. This process 
will repeat until the instant is completed. The below visualization captures 
these transitions at a high level 
+
+![table service lifecycle 
(1)](https://github.com/user-attachments/assets/4a656bde-4046-4d37-9398-db96144207aa)
+
+## Clean and rollback of failed writes
+The clean table service, in addition to performing a clean action, is 
responsible for rolling back any failed ingestion writes 
(non-clustering/non-compaction inflight instants that are not being executed by 
a writer). This means that table services plans are not currently subject to 
clean. As detailed below, this proposal for supporting cancellable table 
service will require enabling clean be capable of targeting table service plans.
+
+## Goals
+### (1) A cancellable plan should be pre-empted by other writers
+The current requirement of HUDI needing to execute a table service plan to 
completion forces ingestion writers to abort a commit if a table service plan 
is conflicting. Becuase an ingestion writer typically determines the exact file 
groups it will be updating/replacing after building a workload profile and 
performing record tagging, the writer may have already spent a lot of time and 
resources before realizing that it needs to abort. In the face of frequent 
table service plans or an old inflight plan, this will cause delays in adding 
recent upstream records to the dataset as well as unecessairly take away 
resources (such as Spark executors in the case of Spark engine) from other 
applications in the data lake. A cancellable table service plan should avoid 
this situation by preventing itself from being comitted if a conflicting 
ingestion job has been comitted already. In conjunction, any ingestion writer 
or non-cancellable table service writer should be able to infer that a 
conflictin
 g inflight table service plan is cancellable, and therefore can be ignored 
when attempting to commit the instant.
+
+### (2) An inflight cancellable plan should be automatically cleaned up
+Another consequence of this existing table service flow is that a table 
service plan cannot be subject to clean's rollback of failed writes. Clean 
typically performs a rollback of inflight instants that are no longer being 
progressed by a writer (and have an inactive heartbeat). Because table service 
plans needed to be executed to completion and don't have an active heartbeat 
these inflight plans cannot be subject to this cleanup. Because an inflight 
plan remaining on the timeline can degrade performance of reads/writes (as 
mentioned earlier), a cancellable table service plan should be elligible to be 
targeted for cleanup if HUDI clean deems that it has remaining inflight for too 
long (or some other critera). Note that a failed table service should still be 
able to be safely cleaned up immeditaley - the goal here is just to make sure 
an inflight plan won't stay on the timeline for an unbounded amount of time but 
also won't be likely to be prematurely cleaned up by clean before it ha
 s a chance to be executed.
+
+## Design
+### Enabling a plan to be pre-emptable
+To satisfy goal (1), a new config flag "cancellable" can be added to a table 
service plan. A writer that intends to schedule a cancellable table service 
plan can enable the flag in the serialized plan metadata. Any writer executing 
the plan can infer that the plan is cancellable, and when trying to commit the 
instant should abort if it detects that any ingestion write or table service 
plan (without cancellable config flag) is targeting the same file groups. On 
the other side, the commit finalization flow for ingestion writers can be 
updated to ignore any inflight table service plans if they are cancellable.
+For the purpose of this design proposal, consider an ingestion job as having 
three steps:
+1. Schedule itself on the timeline with a new instant time in a .requested file
+2. Process/record tag incoming records, build a workload profile, and write 
the updating/replaced file groups to a "inflight" instant file on the timeline. 
Check for conflicts and abort if needed.
+3. Perform write conflict checks and commit the instant on the timeline
+
+The aforementioned changes to ingestion and table service flow will ensure 
that in the event of a conflicting ingestion and cancellable table service 
writer, the ingestion job will take precedence unless the table service job was 
completed before (2). Since in this scenario the ingestion job will see that a 
completed instant (a cancellable table service action) conflicts with its 
ongoing inflight write, and therefore it would not be legal to proceed. 
Unfourtatnely this means that this design cannot compeletly guarantee that 
ingestion job will always take precedence. But future enhancements/hueristics 
can be explored to descrease the chance of this scenario, such as
+* Have the ingestion writer write a "hint" of possible partitions it might 
affect in the .requested file, and the cancellable table service writer can 
check that before commiting the table service plan
+* If the cancellable table service writer sees that there is a .requested file 
for an ingestion action, it can try to wait some time for the .inflight to 
appear before performing write reconcilation checks
+
+### Handling cancellation of plans
+An additional config "cancellation-policy" can be added to the table service 
plan to indicate when it is ellgible to be permenatnly rolled back by writers 
other than the one responsbible for executing the table service. This policy 
can be a threshold of hours or instants on timeline, where if that # of 
hours/instants have elapsed since the plan was scheduled, any writer/operation 
can target it for rollback via clean. This policy should be configured by the 
writer scheduling a cacnellable table service, based on the amount of time they 
expect the plan to remain on the timeline before being picked up for execution. 
For example, if a table service writer is expected to immeditately start 
executing the plan after scheduling it, the the cancellation-policy can just be 
a few minutes. On the other hand, if the plan is expected to have its execution 
deferred to a few hours later, then the cancellation-policy should be more 
lenient. Note that this cancellation policy is not a repalacement fo
 r determining wether a table service plan is currently being executed - as 
wtih ingestion writes, cleanup of a cancellable table service plan should only 
start once it is confirmed that a ongoing writer is no longer progressing it. 
+
+In order to ensure that other writers can indeed permenantely cancel a 
cancellable table service plan (such that it can no longer be executed), 
additional changes to clean and table service flow will be need to be added as 
well. Two proposals are detailed below. Also, note that the cancellation-policy 
is only required to be honored by clean: a user can choose setup an application 
to aggresively clean up a failed cancellable table service plan even if it has 
not meet the critera for its cancellation-policy yet. This can be useful if a 
user wants a utility to manually ensure that clean/archival for a dataset 
progresses immdeitately or knows that a cancellable table service plan will not 
be attempted again or cleaned up by another writer. Each proposal provides an 
example on how to achieve this.
+
+#### (A) Making cancellable plans "mutable" 
+Cancellable table service plans can be updated to have a "mutuable" plan, in 
the sense that once a plan is transitioned to inflight, if the execution of it 
fails the plan must be rolled back and deleted, similar to rollback of failed 
ingestion writes. The flow for table service execution will be similar to the 
existing one for immutable plan, except that if the plan is targeted by a 
rollback plan its execution will abort.
+
+![mutable table service lifecycle 
(3)](https://github.com/user-attachments/assets/7c7d67f5-b6d3-44e6-990b-340920bc16a7)
+
+Once cancellable table service plans are made mutable in this manner, clean 
can rollback failed cancellable table service plans that have met the 
cancellation-policy critera, similar to how clean currently rolls back failed 
ingestion writes. Specifically, clean can check for any failed cancelled table 
service plans that are already part of a pending rollback plan or meet the 
cancellation-policy. From there a rollback can be scheduled/executed for each 
instant.
+With these changes, a failed cancellable table service plan that has met its 
cancellation policy will be guaranteed to be attempted for rollback by the next 
clean. If a user wants to immeditaly cleanup a failed cancellable plan, they 
can bypass the cancellation policy by scheduling and executing a rollback plan, 
the same way that clean will cleanup these plans.
+
+This meets the critera for goal (2). But comes with the following drawback:
+* The instant metadata file for the cancellable table service plan will be 
deleted on rollback, analogous to how rollback of a ingestion instant works. 
This can make it more difficult to debug failed/stuck cancellable table service 
plans
+  
+#### (B) Adding a cancel operation/state for cancellable plans
+An alternate approach can involve updating the possible tmeline actions / 
states, by making the following changes:
+* Add an ".aborted" state type for cancellable table service plan. 
+* Add a new action type "cancel" with two states ".cancel.requested" and 
".cancel". The  ".cancel.requested" metadata file will be a plan that targets a 
(cancellable table service) instant. Once said instant is transitioned to 
aborted state, the action can be completed and transitioned to ".cancel"
+
+A new cancel API will be added that a writer can use to target a cancellable 
table service plan to be aborted. It will create a cancel.request plan for the 
target instant, and execute it. If an existing cancel.requested plan for the 
target already exists, it will try to execute that directly (similar to how the 
rollback API handles pending rollbacks). Execution of a cancel action involves 
the followings steps
+1. Rollback the instant without deleting the table service plan.
+2. Transition the table service instant to .aborted, if it hasn't been already
+3. Transition the cancel plan to .cancel
+Once the cancel action has been transitioned to ".cancel", it can be 
considered complete. The reason this cancel action needs a ".requested" state 
is in order to allow clean/archival to be able to infer when a cancel action is 
completed.

Review Comment:
   Sure. Another way I had in mind was adjusting this approach to not need an 
"aborted" state but instant have an "empty" commit. For example, upon being 
cancelled and rolled back, a clustering instant would still be completed with a 
".repalcecommit" but wouldn't actually mark any files as replaced in the 
metadata. But I felt this might be confusing to users and would cause 
replacecommit.requested not match the replacecommit file metadata



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [RFC-79] Improving reliability of concurrent table service executions and rollbacks [hudi]

Reply via email to