Prashant Wason created HUDI-3178:
------------------------------------
Summary: Metadata table compaction can include invalid updates
from failed actions on dataset
Key: HUDI-3178
URL: https://issues.apache.org/jira/browse/HUDI-3178
Project: Apache Hudi
Issue Type: Bug
Reporter: Prashant Wason
Fix For: 0.10.1
Metadata Table v2 performs an inline compaction once a deltacommit has been
written.
Timeline:
(on dataset) t1.commit.requested
(on dataset) t1.commit.inflight
---- all parquet writes complete here, WriteStatus generated---
(on metadata table) t1.deltacommit.requested
(on metadata table) t1.deltacommit.inflight
(on metadata table) t1.deltacommit
---- deltcommit completed ----
(on metadata table) t1-001.compaction.requested
(on metadata table) t1-001.compaction.inflight
(on metadata table) t1-001.commit
If the t1.commit fails on the dataset then metadata table has already included
information from the t1.commit in its base files which will be returned to
readers. The metadata table reader logic only checks for deltacommits against
completed instants on the dataset timeline and assumes a base file is always
SANE.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)