Prashant Wason created HUDI-3178:
------------------------------------

             Summary: Metadata table compaction can include invalid updates 
from failed actions on dataset
                 Key: HUDI-3178
                 URL: https://issues.apache.org/jira/browse/HUDI-3178
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Prashant Wason
             Fix For: 0.10.1


Metadata Table v2 performs an inline compaction once a deltacommit has been 
written. 

Timeline:
  (on dataset) t1.commit.requested
  (on dataset) t1.commit.inflight
---- all parquet writes complete here, WriteStatus generated---
    (on metadata table) t1.deltacommit.requested
    (on metadata table) t1.deltacommit.inflight
    (on metadata table) t1.deltacommit
---- deltcommit completed ----
    (on metadata table) t1-001.compaction.requested
    (on metadata table) t1-001.compaction.inflight
    (on metadata table) t1-001.commit

If the t1.commit fails on the dataset then metadata table has already included 
information from the t1.commit in its base files which will be returned to 
readers. The metadata table reader logic only checks for deltacommits against 
completed instants on the dataset timeline and assumes a base file is always 
SANE.





--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to