Krishen Bhan created HUDI-8138:
----------------------------------
Summary: Filtering of clustering replacecommits should be
resilient to ongoing replacecommit rollbacks
Key: HUDI-8138
URL: https://issues.apache.org/jira/browse/HUDI-8138
Project: Apache Hudi
Issue Type: Wish
Reporter: Krishen Bhan
*Issue*
When a writer creates an AbstractFileSystem via
`org.apache.hudi.common.table.view.AbstractTableFileSystemView#init`, the API
`org.apache.hudi.common.util.ClusteringUtils#getAllPendingClusteringPlans` is
called which checks wether a repalcecommit plan is clustering. In a similar
manner, when a writer identifies failed instants to rollback, it calls
`org.apache.hudi.client.BaseHoodieTableServiceClient#getInflightTimelineExcludeCompactionAndClustering`
which uses `org.apache.hudi.common.util.ClusteringUtils#isClusteringInstant`
to check wether the replacecommt plan is clustering.
This since prior to
[https://issues.apache.org/jira/projects/HUDI/issues/HUDI-7905?filter=allissues]
, both insert_overwrite and clustering operations use the replacecommit
timeline action type.
If a writer is using these APIs while (non-clustering) instants are being
rolled back, these writers will unnecessarily fail with an exception, since in
between filtering the timeline for inflight replacecommits and reading the plan
metadata from DFS, the replacecommit.requested can be deleted by a concurrent
rollback (since it is legal to rollback a non-clustering replacecommit plan).
*Scenario*
For example, when an ingestion job executes the insert/upsert phase, before it
begins to map each input record into file group buckets it first cross-checks
the input records and the file groups they belong to with the files modified by
pending clustering instants. The following sequence events can lead to the
ingestion job failing
# There is a failed non-clustering replacecommit (RC) on timeline
# Job A starts an ingestion commit. During the execution of ingestion, the
upsert execution step finds RC on timeline. Because the replacecommit.requested
shows that RC isn’t a clustering and doesn’t have any overlapping file groups
with Job A’s in progress commit.
# Job B starts, and same as Job A it finds RC. It begins to check wether RC
has any pending clustering groups that could conflict with Job B’s in-progress
commit
# Job A completes its commit, and does its post-commit phase. This includes a
lazy clean, where it rolls back RC, completely removing it from timeline
# Job B attempts to open RC’s replacecommit.requested file, but fails with a
file-not-found error due to the file no longer existing
*Resolution*
This limitation can be resolved by identifying specific APIs where HUDI filters
a set of inflight replacecommits for instants that are clustering. The two
cases mentioned above are specific APIs in HUDI, but there can potentially be
more.
Each case can be handled by updating the implementation to not suppress a
file-not-found error. In other words, if a repalcecommit.requested no longer
exists then it will be assumed that it was a non-clustering replacecommit. This
should be a safe assumption, since if the replacecommit.requested belonged to a
clustering operation then it would not have been deleted.
Although locking/synchronization might also potentially resolve this issue (by
having HUDI filter replacecommits + read all repalcecommit.requested files
under a table lock), it is likely not a feasible solution since HUDI readers
will not be able to use HUDI Multiwriter OCC semantics
--
This message was sent by Atlassian Jira
(v8.20.10#820010)