Krishen Bhan created HUDI-8138:
----------------------------------

             Summary: Filtering of clustering replacecommits should be 
resilient to ongoing replacecommit rollbacks
                 Key: HUDI-8138
                 URL: https://issues.apache.org/jira/browse/HUDI-8138
             Project: Apache Hudi
          Issue Type: Wish
            Reporter: Krishen Bhan


*Issue*

When a writer creates an AbstractFileSystem  via 
`org.apache.hudi.common.table.view.AbstractTableFileSystemView#init`, the API  
`org.apache.hudi.common.util.ClusteringUtils#getAllPendingClusteringPlans` is 
called which checks wether a repalcecommit plan is clustering. In a similar 
manner, when a writer identifies failed instants to rollback, it calls 
`org.apache.hudi.client.BaseHoodieTableServiceClient#getInflightTimelineExcludeCompactionAndClustering`
 which uses `org.apache.hudi.common.util.ClusteringUtils#isClusteringInstant` 
to check wether the replacecommt plan is clustering. 
This since prior to 
[https://issues.apache.org/jira/projects/HUDI/issues/HUDI-7905?filter=allissues]
 , both insert_overwrite and clustering operations use the replacecommit 
timeline action type.
If a writer is using these APIs while (non-clustering) instants are being 
rolled back, these writers will unnecessarily fail with an exception, since in 
between filtering the timeline for inflight replacecommits and reading the plan 
metadata from DFS, the replacecommit.requested can be deleted by a concurrent 
rollback (since it is legal to rollback a non-clustering replacecommit plan). 



*Scenario*

For example, when an ingestion job executes the insert/upsert phase, before it 
begins to map each input record into file group buckets it first cross-checks 
the input records and the file groups they belong to with the files modified by 
pending clustering instants. The following sequence events can lead to the 
ingestion job failing
 # There is a failed non-clustering replacecommit (RC) on timeline
 # Job A starts an ingestion commit. During the execution of ingestion, the 
upsert execution step finds RC on timeline. Because the replacecommit.requested 
shows that RC isn’t a clustering and doesn’t have any overlapping file groups 
with Job A’s in progress commit.
 # Job B starts, and same as Job A it finds RC. It begins to check wether RC 
has any pending clustering groups that could conflict with Job B’s in-progress 
commit
 # Job A completes its commit, and does its post-commit phase. This includes a 
lazy clean, where it rolls back RC, completely removing it from timeline
 # Job B attempts to open RC’s replacecommit.requested file,  but fails with a 
file-not-found error due to the file no longer existing



*Resolution*

This limitation can be resolved by identifying specific APIs where HUDI filters 
a set of inflight replacecommits for instants that are clustering. The two 
cases mentioned above are specific APIs in HUDI, but there can potentially be 
more. 
Each case can be handled by updating the implementation to not suppress a 
file-not-found error. In other words, if a repalcecommit.requested no longer 
exists then it will be assumed that it was a non-clustering replacecommit. This 
should be a safe assumption, since if the replacecommit.requested belonged to a 
clustering operation then it would not have been deleted. 
Although locking/synchronization might also potentially resolve this issue (by 
having HUDI filter replacecommits + read all repalcecommit.requested files 
under a table lock), it is likely not a feasible solution since HUDI readers 
will not be able to use HUDI Multiwriter OCC semantics

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to