jtmzheng opened a new issue, #8362: URL: https://github.com/apache/hudi/issues/8362
I'm trying to understand OCC and how table services (especially cleaning, but also compaction and clustering) should/need to be deployed with multiple writers to a single table. The RFC (https://cwiki.apache.org/confluence/display/HUDI/RFC+-+22+%3A+Snapshot+Isolation+using+Optimistic+Concurrency+Control+for+multi-writers under "Scheduling Table Management Services ") implies that a lock is acquired throughout scheduling table service operations - is that how it is implemented? That implies to me cleaning, compaction, clustering, etc *can* be enabled for multiple writers to the same table, but its not clear from the docs whether this is the case. eg. can `hoodie.clean.automatic` be enabled for all writers when there are multiple writers? The docs (https://hudi.apache.org/docs/concurrency_control/#enabling-multi-writing) note that `hoodie.cleaner.policy.failed.writes = LAZY` must be set: > Cleaning policy for failed writes to be used. Hudi will delete any files written by failed writes to re-claim space. Choose to perform this rollback of failed writes eagerly before every writer starts (only supported for single writer) or lazily by the cleaner (required for multi-writers) What does this mean? Does this mean the cleaner can only be enabled for a single writer (or run independently)? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
