VitoMakarevich commented on issue #10964: URL: https://github.com/apache/hudi/issues/10964#issuecomment-2040415437
Update - dug into the code `clusteringHandleUpdate`, and see that if: Updates rejected - write fails. Updates accepted - if(`hoodie.clustering.rollback.pending.replacecommit.on.conflict` is `true`) - those pending clustering instants that conflict with update records - rolled back. Updates accepted - if(`hoodie.clustering.rollback.pending.replacecommit.on.conflict` is `false`) - pending clustering instants left on commit line, updates made to previous files. So it looks like switching these 2: [hoodie.clustering.updates.strategy](https://hudi.apache.org/docs/configurations/#hoodieclusteringupdatesstrategy) -> `org.apache.hudi.client.clustering.update.strategy.SparkRejectUpdateStrategy` (non-default) [hoodie.clustering.rollback.pending.replacecommit.on.conflict](https://hudi.apache.org/docs/configurations/#hoodieclusteringrollbackpendingreplacecommitonconflict) -> `true`(non-default) is generally safe for all operations inline and single writer. e.g. if the commit fails in the middle of clustering - subsequent commit will be run and it will synchronously rollback clustering instants, and writing updates into old files. Can someone confirm? @nsivabalan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
