jtmzheng opened a new issue, #8362:
URL: https://github.com/apache/hudi/issues/8362

   I'm trying to understand OCC and how table services (especially cleaning, 
but also compaction and clustering) should/need to be deployed with multiple 
writers to a single table. The RFC 
(https://cwiki.apache.org/confluence/display/HUDI/RFC+-+22+%3A+Snapshot+Isolation+using+Optimistic+Concurrency+Control+for+multi-writers
 under "Scheduling Table Management Services
   ") implies that a lock is acquired throughout scheduling table service 
operations - is that how it is implemented? That implies to me  cleaning, 
compaction, clustering, etc *can* be enabled for multiple writers to the same 
table, but its not clear from the docs whether this is the case. 
   
   eg. can `hoodie.clean.automatic` be enabled for all writers when there are 
multiple writers?
   
   The docs 
(https://hudi.apache.org/docs/concurrency_control/#enabling-multi-writing)  
note that `hoodie.cleaner.policy.failed.writes = LAZY` must be set:
   
   > Cleaning policy for failed writes to be used. Hudi will delete any files 
written by failed writes to re-claim space. Choose to perform this rollback of 
failed writes eagerly before every writer starts (only supported for single 
writer) or lazily by the cleaner (required for multi-writers)
   
   What does this mean? Does this mean the cleaner can only be enabled for a 
single writer (or run independently)?
   
   Thanks!
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to