[
https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raymond Xu updated HUDI-5464:
-----------------------------
Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3
(was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2)
> Fix instantiation of a new partition in MDT re-using the same instant time as
> a regular commit
> ----------------------------------------------------------------------------------------------
>
> Key: HUDI-5464
> URL: https://issues.apache.org/jira/browse/HUDI-5464
> Project: Apache Hudi
> Issue Type: Bug
> Components: metadata
> Reporter: Alexey Kudinkin
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Fix For: 0.13.0
>
>
> we re-use the same instant time as the commit being applied to MDT while
> instantiating a new partition in MDT. this needs to be fixed.
>
> for eg, lets say we have 10 commits w/ already FILES enabled.
> for C11, we are enabling col-stats.
> after data table business, when we enter metadata writer instantiation, we
> deduct that col-stats has to be instantiated and then instantiate using DC11.
> in MDT timeline, we see dc11.req. dc11.inflight and dc11.complete. and then
> we go ahead and apply actual C11 from DT to MDT (dc11.inflight and
> dc11.complete is updated). here, we overwrite the same DC11 w/ records
> pertaining to C11.
> which is buggy. we definitely need to fix this.
> We can add a suffix to C11 (say C11_003 or C11_001) as we do for compaction
> and clean in MDT so that any additional operation in MDT has a diff commit
> time format. For everything else, it should match w/ DT 1 on 1.
>
>
> Impact:
> We are over-riding the same DC for two purposes which is bad. if there is a
> crash after initializing col-stats and before applying actual C11(in above
> context), we might mistakenly rollback col-stats initialization, but still
> table config could say that col stats is fully ready to be served. But while
> reading MDT, we may not read DC11 since its a failed commit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)