[
https://issues.apache.org/jira/browse/HUDI-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394392#comment-17394392
]
Vinoth Chandar commented on HUDI-2159:
--------------------------------------
[~nishith29] [~pwason] any updates on this? Like to get this fixed before 0.9.0
next week.
> Supporting Clustering and Metadata Table together
> -------------------------------------------------
>
> Key: HUDI-2159
> URL: https://issues.apache.org/jira/browse/HUDI-2159
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Prashant Wason
> Assignee: Prashant Wason
> Priority: Blocker
> Fix For: 0.9.0
>
>
> I am testing clustering support for metadata enabled table and found a few
> issues.
> *Setup*
> Pipeline 1: Ingestion pipeline with Metadata Table enabled. Runs every 30
> mins.
> Pipeline 2: Clustering pipeline with long running jobs (3-4 hours)
> Pipeline 3: Another clustering pipeline with long running jobs (3-4 hours)
>
> *Issue #1: Parallel commits on Metadata Table*
> Assume the Clustering pipeline is completing T5.replacecommit and ingestion
> pipeline is completing T10.commit. Metadata Table will synced at an instant
> <T5 (Say T4) since it only sync in completion order.
> Now both the pipelines will call syncMetadataTable() which will do the
> following:
> # Find all un-synced instants from dataset (T5, T6 ... T10)
> # Read each instant and perform a deltacommit on the Metadata Table with the
> same timestamp as instant.
> There is a chance that two processed perform deltacommit at T5 on the
> metadata table and one will fail (instant file already exists). This will be
> an exception raised and will be detected as failure of pipeline leading to
> false-positive alerts.
>
> *Issue #2: No archiving/rollback support for failed clustering operations*
> If a clustering operation fails, it leaves a left-over
> T5.replacecommit.inflight. There is no automated way to rollback or archive
> these. Since clustering is a long running operation in general and may be run
> through multiple pipelines at the same time, automated rollback of left-over
> inflights doesnt work as we cannot be sure that the process is dead.
> Metadata Table sync only works in completion order. So if
> T5.replacecommit.inflight is left-over, Metadata Table will not sync beyond
> T5 causing a large number of LogBLocks to pile up which will have to be
> merged in memory leading to deteriorating performance.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)