kbuci commented on code in PR #11555: URL: https://github.com/apache/hudi/pull/11555#discussion_r1751030953
########## rfc/rfc-79/rfc-79-2.md: ########## @@ -0,0 +1,99 @@ +w<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +# Add support for cancellable table service plans + +## Proposers + + +## Approvers + +## Status + +JIRA: HUDI-7946 + + +## Abstract +Table service plans can delay ingestion writes from updating a dataset with recent data if potential write conflicts are detected. Furthermore, a table service plan that isn't executed to completion for a large amount of time (due to repeated failures, application misconfiguration, or insufficient resources) will degrade the read/write performance of a dataset due to delaying clean, archival, and metadata table compaction. This is because currently HUDI table service plans, upon being scheduled, must be executed to completion. And additonally will prevent any ingestion write targeting the same files from succeeding (due to posing as a write conflict) as well as prevent any new table service plan from targeting the same files. Enabling a user to configure a table service plan as "cancellable" can prevent frequent or repeatedly failing table service plans from delaying ingestion. Support for cancellable plans will provide HUDI an avenue to fully cancel a table service plan and allow o ther table service to proceed. + + +## Background +### Execution of table services +The table service operations compact and cluster are by default "immutable" plans, meaning that once a plan is scheduled it will stay as as a pending instant until a caller invokes the table service execute API on the table service instant and sucessfully completes it. Specifically, if an inflight execution fails after transitioning the instant to inflight, the next execution attempt will implictly create and execute a rollback plan (which will delete all new instant/data files), but will keep the table service plan. This process will repeat until the instant is completed. The below visualization captures these transitions at a high level + + + +## Clean and rollback of failed writes +The clean table service, in addition to performing a clean action, is responsible for rolling back any failed ingestion writes (non-clustering/non-compaction inflight instants that are not being executed by a writer). This means that table services plans are not currently subject to clean. As detailed below, this proposal for supporting cancellable table service will require enabling clean be capable of targeting table service plans. + +## Goals +### (1) A cancellable plan should be pre-empted by other writers +The current requirement of HUDI needing to execute a table service plan to completion forces ingestion writers to abort a commit if a table service plan is conflicting. Becuase an ingestion writer typically determines the exact file groups it will be updating/replacing after building a workload profile and performing record tagging, the writer may have already spent a lot of time and resources before realizing that it needs to abort. In the face of frequent table service plans or an old inflight plan, this will cause delays in adding recent upstream records to the dataset as well as unecessairly take away resources (such as Spark executors in the case of Spark engine) from other applications in the data lake. A cancellable table service plan should avoid this situation by preventing itself from being comitted if a conflicting ingestion job has been comitted already. In conjunction, any ingestion writer or non-cancellable table service writer should be able to infer that a conflictin g inflight table service plan is cancellable, and therefore can be ignored when attempting to commit the instant. + +### (2) An inflight cancellable plan should be automatically cleaned up +Another consequence of this existing table service flow is that a table service plan cannot be subject to clean's rollback of failed writes. Clean typically performs a rollback of inflight instants that are no longer being progressed by a writer (and have an inactive heartbeat). Because table service plans needed to be executed to completion and don't have an active heartbeat these inflight plans cannot be subject to this cleanup. Because an inflight plan remaining on the timeline can degrade performance of reads/writes (as mentioned earlier), a cancellable table service plan should be elligible to be targeted for cleanup if HUDI clean deems that it has remaining inflight for too long (or some other critera). Note that a failed table service should still be able to be safely cleaned up immeditaley - the goal here is just to make sure an inflight plan won't stay on the timeline for an unbounded amount of time but also won't be likely to be prematurely cleaned up by clean before it ha s a chance to be executed. + +## Design +### Enabling a plan to be pre-emptable +To satisfy goal (1), a new config flag "cancellable" can be added to a table service plan. A writer that intends to schedule a cancellable table service plan can enable the flag in the serialized plan metadata. Any writer executing the plan can infer that the plan is cancellable, and when trying to commit the instant should abort if it detects that any ingestion write or table service plan (without cancellable config flag) is targeting the same file groups. On the other side, the commit finalization flow for ingestion writers can be updated to ignore any inflight table service plans if they are cancellable. +For the purpose of this design proposal, consider an ingestion job as having three steps: +1. Schedule itself on the timeline with a new instant time in a .requested file +2. Process/record tag incoming records, build a workload profile, and write the updating/replaced file groups to a "inflight" instant file on the timeline. Check for conflicts and abort if needed. +3. Perform write conflict checks and commit the instant on the timeline + +The aforementioned changes to ingestion and table service flow will ensure that in the event of a conflicting ingestion and cancellable table service writer, the ingestion job will take precedence unless the table service job was completed before (2). Since in this scenario the ingestion job will see that a completed instant (a cancellable table service action) conflicts with its ongoing inflight write, and therefore it would not be legal to proceed. Unfourtatnely this means that this design cannot compeletly guarantee that ingestion job will always take precedence. But future enhancements/hueristics can be explored to descrease the chance of this scenario, such as +* Have the ingestion writer write a "hint" of possible partitions it might affect in the .requested file, and the cancellable table service writer can check that before commiting the table service plan +* If the cancellable table service writer sees that there is a .requested file for an ingestion action, it can try to wait some time for the .inflight to appear before performing write reconcilation checks + +### Handling cancellation of plans +An additional config "cancellation-policy" can be added to the table service plan to indicate when it is ellgible to be permenatnly rolled back by writers other than the one responsbible for executing the table service. This policy can be a threshold of hours or instants on timeline, where if that # of hours/instants have elapsed since the plan was scheduled, any writer/operation can target it for rollback via clean. This policy should be configured by the writer scheduling a cacnellable table service, based on the amount of time they expect the plan to remain on the timeline before being picked up for execution. For example, if a table service writer is expected to immeditately start executing the plan after scheduling it, the the cancellation-policy can just be a few minutes. On the other hand, if the plan is expected to have its execution deferred to a few hours later, then the cancellation-policy should be more lenient. Note that this cancellation policy is not a repalacement fo r determining wether a table service plan is currently being executed - as wtih ingestion writes, cleanup of a cancellable table service plan should only start once it is confirmed that a ongoing writer is no longer progressing it. + +In order to ensure that other writers can indeed permenantely cancel a cancellable table service plan (such that it can no longer be executed), additional changes to clean and table service flow will be need to be added as well. Two proposals are detailed below. Also, note that the cancellation-policy is only required to be honored by clean: a user can choose setup an application to aggresively clean up a failed cancellable table service plan even if it has not meet the critera for its cancellation-policy yet. This can be useful if a user wants a utility to manually ensure that clean/archival for a dataset progresses immdeitately or knows that a cancellable table service plan will not be attempted again or cleaned up by another writer. Each proposal provides an example on how to achieve this. + +#### (A) Making cancellable plans "mutable" +Cancellable table service plans can be updated to have a "mutuable" plan, in the sense that once a plan is transitioned to inflight, if the execution of it fails the plan must be rolled back and deleted, similar to rollback of failed ingestion writes. The flow for table service execution will be similar to the existing one for immutable plan, except that if the plan is targeted by a rollback plan its execution will abort. + + + +Once cancellable table service plans are made mutable in this manner, clean can rollback failed cancellable table service plans that have met the cancellation-policy critera, similar to how clean currently rolls back failed ingestion writes. Specifically, clean can check for any failed cancelled table service plans that are already part of a pending rollback plan or meet the cancellation-policy. From there a rollback can be scheduled/executed for each instant. +With these changes, a failed cancellable table service plan that has met its cancellation policy will be guaranteed to be attempted for rollback by the next clean. If a user wants to immeditaly cleanup a failed cancellable plan, they can bypass the cancellation policy by scheduling and executing a rollback plan, the same way that clean will cleanup these plans. + +This meets the critera for goal (2). But comes with the following drawback: +* The instant metadata file for the cancellable table service plan will be deleted on rollback, analogous to how rollback of a ingestion instant works. This can make it more difficult to debug failed/stuck cancellable table service plans + +#### (B) Adding a cancel operation/state for cancellable plans +An alternate approach can involve updating the possible tmeline actions / states, by making the following changes: +* Add an ".aborted" state type for cancellable table service plan. +* Add a new action type "cancel" with two states ".cancel.requested" and ".cancel". The ".cancel.requested" metadata file will be a plan that targets a (cancellable table service) instant. Once said instant is transitioned to aborted state, the action can be completed and transitioned to ".cancel" + +A new cancel API will be added that a writer can use to target a cancellable table service plan to be aborted. It will create a cancel.request plan for the target instant, and execute it. If an existing cancel.requested plan for the target already exists, it will try to execute that directly (similar to how the rollback API handles pending rollbacks). Execution of a cancel action involves the followings steps +1. Rollback the instant without deleting the table service plan. +2. Transition the table service instant to .aborted, if it hasn't been already +3. Transition the cancel plan to .cancel +Once the cancel action has been transitioned to ".cancel", it can be considered complete. The reason this cancel action needs a ".requested" state is in order to allow clean/archival to be able to infer when a cancel action is completed. Review Comment: Sure. Another way I had in mind was adjusting this approach to not need an "aborted" state but instant have an "empty" commit. For example, upon being cancelled and rolled back, a clustering instant would still be completed with a ".repalcecommit" but wouldn't actually mark any files as replaced in the metadata. But I felt this might be confusing to users and would cause replacecommit.requested not match the replacecommit file metadata -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
