kbuci commented on PR #18160: URL: https://github.com/apache/hudi/pull/18160#issuecomment-3881836875
> Not sure I understand the exact sequence here. can you help me understand. > > Any write to mdt is guarded by data table lock and hence its fair to say, mdt is a single writer table. > > whenever we wanted to apply any commits or clean or rollback from data table to mdt, we instantiate a new mdt writer, apply the commit/clean/rollback and close it out. > > Which means, every new write to mdt, will always have updated mdt timeline. So, how come writer 2 could see a stale timeline in mdt, if writer 1 just happened to update mdt timeline. @nsivabalan Sure let me clarify, this isn't a multi-writer scenario actually. The scenario is: - The `processAndCommit` https://github.com/apache/hudi/pull/18160/changes#diff-65dd70e3cb912c49b3972598c897e47c8ef08f687789f86cb33f567006ef50e9R215 above runs - It indirectly calls `commitInternal` - Which then calls `metadataMetaClient = rollbackFailedWrites(dataWriteConfig, writeClient, metadataMetaClient);` And then rolls back inflight deltacommits in MDT timeline, including the target `deltaCommitInstant` of the rollback. Although this above rollback call reloads the MDT metaclient/timeline, when we get to https://github.com/apache/hudi/pull/18160/changes#diff-65dd70e3cb912c49b3972598c897e47c8ef08f687789f86cb33f567006ef50e9L218 we don't see this refreshed timeline. And hence we attempt to rollback a deltacommit in MDT timeline that was already rolled back. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
