yihua commented on pull request #5186: URL: https://github.com/apache/hudi/pull/5186#issuecomment-1086039243
After a discussion, simply using `hoodie.table.metadata.enable` with two values (true/false) may not guarantee atomic deletion of metadata table in `.hoodie/metadata`. Let's go through the following failure scenario based on the current logic: 1. Metadata table (MDT) is enabled in HoodieWriteConfig (HWC) for the writer. No failure, MDT is initialized and commit succeeds. `hoodie.table.metadata.enable` is set to true in the table config. 2. Before the next commit, MDT is disabled in HWC. No failure, MDT is removed. `hoodie.table.metadata.enable` is set to false in the table config. 3. Before the next commit, MDT is enabled in HWC. MDT is created and the job fails before updating `hoodie.table.metadata.enable`. Now, `hoodie.table.metadata.enable` is still false but there is MDT in `.hoodie/metadata`. 4. Before the next commit, MDT is disabled in HWC. The writer does not clean up MDT because `hoodie.table.metadata.enable` is false. In such a case, the detention of MDT is not complete. The lingering MDT can cause correctness issue later on if MDT is enabled again in HWC. We need to have three states like Hudi instant in the timeline: requested, inflight, and completed, to identify the failure scenario like above. Given the complexity of the potential new design, this may not be able to make to 0.11.0 release. Removing the blocker label. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
