[ 
https://issues.apache.org/jira/browse/HUDI-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y Ethan Guo updated HUDI-8138:
------------------------------
    Fix Version/s: 1.1.0

> Filtering of clustering replacecommits should be resilient to ongoing 
> replacecommit rollbacks
> ---------------------------------------------------------------------------------------------
>
>                 Key: HUDI-8138
>                 URL: https://issues.apache.org/jira/browse/HUDI-8138
>             Project: Apache Hudi
>          Issue Type: Wish
>            Reporter: Krishen Bhan
>            Priority: Trivial
>             Fix For: 1.1.0
>
>
> *Issue*
> When a writer creates an AbstractFileSystem  via 
> `org.apache.hudi.common.table.view.AbstractTableFileSystemView#init`, the API 
>  `org.apache.hudi.common.util.ClusteringUtils#getAllPendingClusteringPlans` 
> is called which checks wether a repalcecommit plan is clustering. In a 
> similar manner, when a writer identifies failed instants to rollback, it 
> calls 
> `org.apache.hudi.client.BaseHoodieTableServiceClient#getInflightTimelineExcludeCompactionAndClustering`
>  which uses `org.apache.hudi.common.util.ClusteringUtils#isClusteringInstant` 
> to check wether the replacecommt plan is clustering. 
> This since prior to 
> [https://issues.apache.org/jira/projects/HUDI/issues/HUDI-7905?filter=allissues]
>  , both insert_overwrite and clustering operations use the replacecommit 
> timeline action type.
> If a writer is using these APIs while (non-clustering) instants are being 
> rolled back, these writers will unnecessarily fail with an exception, since 
> in between filtering the timeline for inflight replacecommits and reading the 
> plan metadata from DFS, the replacecommit.requested can be deleted by a 
> concurrent rollback (since it is legal to rollback a non-clustering 
> replacecommit plan). 
> *Scenario*
> For example, when an ingestion job executes the insert/upsert phase, before 
> it begins to map each input record into file group buckets it first 
> cross-checks the input records and the file groups they belong to with the 
> files modified by pending clustering instants. The following sequence events 
> can lead to the ingestion job failing
>  # There is a failed non-clustering replacecommit (RC) on timeline
>  # Job A starts an ingestion commit. During the execution of ingestion, the 
> upsert execution step finds RC on timeline. Because the 
> replacecommit.requested shows that RC isn’t a clustering and doesn’t have any 
> overlapping file groups with Job A’s in progress commit.
>  # Job B starts, and same as Job A it finds RC. It begins to check wether RC 
> has any pending clustering groups that could conflict with Job B’s 
> in-progress commit
>  # Job A completes its commit, and does its post-commit phase. This includes 
> a lazy clean, where it rolls back RC, completely removing it from timeline
>  # Job B attempts to open RC’s replacecommit.requested file,  but fails with 
> a file-not-found error due to the file no longer existing
> *Resolution*
> This limitation can be resolved by identifying specific APIs where HUDI 
> filters a set of inflight replacecommits for instants that are clustering. 
> The two cases mentioned above are specific APIs in HUDI, but there can 
> potentially be more. 
> Each case can be handled by updating the implementation to not suppress a 
> file-not-found error. In other words, if a repalcecommit.requested no longer 
> exists then it will be assumed that it was a non-clustering replacecommit. 
> This should be a safe assumption, since if the replacecommit.requested 
> belonged to a clustering operation then it would not have been deleted. 
> Although locking/synchronization might also potentially resolve this issue 
> (by having HUDI filter replacecommits + read all repalcecommit.requested 
> files under a table lock), it is likely not a feasible solution since HUDI 
> readers will not be able to use HUDI Multiwriter OCC semantics
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to