[
https://issues.apache.org/jira/browse/HUDI-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Y Ethan Guo updated HUDI-8208:
------------------------------
Sprint: Hudi 1.0 Sprint 2024/09/16-22, Hudi 1.0 Sprint 2024/09/16-23 (was:
Hudi 1.0 Sprint 2024/09/16-22)
> Fix partition stats with compaction or clustering
> -------------------------------------------------
>
> Key: HUDI-8208
> URL: https://issues.apache.org/jira/browse/HUDI-8208
> Project: Apache Hudi
> Issue Type: Bug
> Components: metadata
> Reporter: Lokesh Jain
> Assignee: Lokesh Jain
> Priority: Blocker
> Fix For: 1.0.0
>
>
> Consider a partition with 10 file slices. If compaction triggered for 1 file
> slice fs1_1, the partition stats are updated for that file slice with the
> same key (partition path). The older partition stat record for that partition
> path would account for the other 9 file slices (fs2_0 - fs10_0) + the older
> stat (fs1_0). The final read value would be merging of all versions of file
> slices (fs2_0 - fs10_0, fs1_0, fs1_1). It should only account for the latest
> version of fs1.
> Upon compaction or clustering, the partition stat should be recomputed and
> the older records for that partition should be invalidated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)