kbuci commented on code in PR #11555: URL: https://github.com/apache/hudi/pull/11555#discussion_r1887664196
########## rfc/rfc-79/rfc-79.md: ########## @@ -0,0 +1,154 @@ +w<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +# Add support for cancellable table service plans + +## Proposers + + +## Approvers + +## Status + +JIRA: HUDI-7946 + + +## Abstract +Table service plans can delay ingestion writes from updating a dataset with recent data if potential write conflicts are detected. Furthermore, a table service plan that isn't executed to completion for a large amount of time (due to repeated failures, application misconfiguration, or insufficient resources) will degrade the read/write performance of a dataset due to delaying clean, archival, and metadata table compaction. This is because currently HUDI table service plans, upon being scheduled, must be executed to completion. And additonally will prevent any ingestion write targeting the same files from succeeding (due to posing as a write conflict) as well as can prevent new table service plans from targeting the same files. Enabling a user to configure a table service plan as "cancellable" can prevent frequent or repeatedly failing table service plans from delaying ingestion. Support for cancellable plans will provide HUDI an avenue to fully cancel a table service plan and allow other table service and ingestion writers to proceed. + + +## Background +### Execution of table services +The table service operations compact and cluster are by default "immutable" plans, meaning that once a plan is scheduled it will stay as as a pending instant until a caller invokes the table service execute API on the table service instant and sucessfully completes it. Specifically, if an inflight execution fails after transitioning the instant to inflight, the next execution attempt will implictly create and execute a rollback plan (which will delete all new instant/data files), but will keep the table service plan. This process will repeat until the instant is completed. The below visualization captures these transitions at a high level + + + +## Clean and rollback of failed writes +The clean table service, in addition to performing a clean action, is responsible for rolling back any failed ingestion writes (non-clustering/non-compaction inflight instants that are not being concurrently executed by a writer). This means that table services plans are not currently subject to clean's rollback of failed writes. As detailed below, this proposal for supporting cancellable table service will benefit from enabling clean be capable of targeting table service plans. + +## Goals +### (A) A cancellable table service plan should be capable of preventing itself from committing upon presence of write conflict +The current requirement of HUDI needing to execute a table service plan to completion forces ingestion writers to abort a commit if a table service plan is conflicting. Becuase an ingestion writer typically determines the exact file groups it will be updating/replacing after building a workload profile and performing record tagging, the writer may have already spent a lot of time and resources before realizing that it needs to abort. In the face of frequent table service plans or an old inflight plan, this will cause delays in adding recent upstream records to the dataset as well as unecessairly take away resources (such as Spark executors in the case of the Spark engine) from other applications in the data lake. A cancellable table service plan should avoid this situation by preventing itself from being committed if a conflicting ingestion job has been comitted already, and cancel itself. In conjunction, any ingestion writer or non-cancellable table service writer should be able to infer that a conflicting inflight table service plan is cancellable, and therefore can be ignored when attempting to commit the instant. + +### (B) A cancellable table service plan should be eligible for cancellation at any point before committing +A writer should be able to explictly cancel a cancellable table service plan that an ongoing concurrent writer is executing, as long as it has not been committed yet. This requirement is needed due to presence of concurrent and async writers for table service execution, as another writer should not need to wait for a table service writer to execute further or fail before confirming that its cancellation request will be honored. As will be shown later, this not require the writer requesting the cancellation to have the ability to terminate/fail the writer of the target cancellable tale service plan. + +### (C) An incomplete cancellable plan should eventually have its partial writes cleaned up +Although cancellation (be it via an explict request or due to a write conflict) can ensure that a table service write is never committed, there still needs to be a mechanism to have its data and instant files cleaned up permenantly. At minumum the table service writer itself should be able to do this cleanup, but this is not sufficient as orchestration/transient failrures/resource allocation can prevent table service writers from re-attempting their write. Clean can be used to guarantee that an incomplete cancellable plan is eventually cleaned up, since datasets that undergo clustering are anyway expected to undergo regular clean operations. Because an inflight plan remaining on the timeline can degrade performance of reads/writes (as mentioned earlier), a cancellable table service plan should be elligible to be targeted for cleanup if HUDI clean deems that it has remained inflight for too long (or some other critera). +Note that a failed table service should still be able to be safely cleaned up immediately - the goal here is just to make sure an inflight plan won't stay on the timeline for an unbounded amount of time but also won't be likely to be prematurely cleaned up by clean before it has a chance to be executed. + +## Design +### Enabling a plan to be cancellable +To satisfy goal (A), a new config flag "cancellable" can be added to a table service plan. A writer that intends to schedule a cancellable table service plan can enable the flag in the serialized plan metadata. Any writer executing the plan can infer that the plan is cancellable, and when trying to commit the instant should abort if it detects that any ingestion write or table service plan (without cancellable config flag) is targeting the same file groups. As a future optimization, the cancellable table writer can use early conflict detection (instead of waiting until committing the instant) to repeatadly poll for any conflicting write appearing on timeline, and abort earlier if needed. +On the other side in ingestion write, the commit finalization flow for ingestion writers can be updated to ignore any inflight table service plans if they are cancellable. +For the purpose of this design proposal, consider an ingestion job as having three steps: +1. Schedule itself on the timeline with a new instant time in a .requested file +2. Process/record tag incoming records, build a workload profile, and write the updating/replaced file groups to a "inflight" instant file on the timeline. Check for conflicts and abort if needed. +3. Perform write conflict checks and commit the instant on the timeline + +The aforementioned changes to ingestion and table service flow will ensure that in the event of a conflicting ingestion and cancellable table service writer, the ingestion job will take precedence (and cause the cancellable table service instant to eventually fail) as long as a cancellable table service hasn't be completed before (2). Since if the cancellable table service has already been completed before (2), the ingestion job will see that a completed instant (a cancellable table service action) conflicts with its ongoing inflight write, and therefore it would not be legal to proceed. + +### Adding a cancel action and abort state for cancellable plans Review Comment: > or eg: delete partition operation. Currently if delete partition tries to target same partition that has file groups included in a (non-cancellable) pending clustering plan, will it also not replace/delete those file groups? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
