kbuci opened a new issue, #17908:
URL: https://github.com/apache/hudi/issues/17908

   ### Task Description
   
   **What needs to be done:**
   - Implement https://issues.apache.org/jira/browse/HUDI-9407  so that users 
can specify using `OPTIMISTIC_CONCURRENCY_CONTROL` when creating metadata table 
writer 
`org.apache.hudi.metadata.HoodieMetadataWriteUtils#createMetadataWriteConfig` 
   
   Add a new config that lists the table services supported for inline 
scheduling but async execution in metadata table. Initially we can choose to 
support just compaction and logcompaction. If enabled, 
   -  metadata table writer should still schedule all table service plans 
inline but not execute the specified types of table service  plans inline. Note 
that this means it should also not re-try these plans in 
`org.apache.hudi.metadata.HoodieBackedTableMetadataWriter#runPendingTableServicesOperationsAndRefreshTimeline`
 
   - Another concurrent writer can initialize a metadata table writer (passing 
in OCC as concurrency type in the metadata write config) and execute these 
plans.
   
   **Why this task is needed:**
   For datasets with large metadata table partitions (like RECORD_INDEX) we 
cannot have metadata table compaction be executed during the write, as that 
will impact runtimes. Rather, we only schedule the plan inline, and have a 
separate platform that executes these plans. We need the above features to 
ensure that we can have specific table services on the metadata table be 
guaranteed to not execute inline during write. And instead, outside concurrent 
writers can safely execute these plans. This "async" execution will only take 
the table lock when transitioning the instants, in order to not block the 
writer job. We can upstream our implementations for above once we reach 
consensus. 
   
   
   ### Task Type
   
   Code improvement/refactoring
   
   ### Related Issues
   
   **Parent feature issue:** (if applicable )
   **Related issues:**
   NOTE: Use `Relationships` button to add parent/blocking issues after issue 
is created.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to