[ 
https://issues.apache.org/jira/browse/HUDI-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950620#comment-17950620
 ] 

sivabalan narayanan commented on HUDI-9281:
-------------------------------------------

We have a end to end working [https://github.com/apache/hudi/pull/13236] 

 

But stat splitting it up w/ multiple smaller patches. 

1. Metadata Upsert Partitioner [https://github.com/apache/hudi/pull/13001] 
2. Metadata Spark CommitActionExecutor. 
[https://github.com/apache/hudi/pull/13005] 
+ upsertPrepped w/ additional argument in HoodieTable 
+ upsertPrepped w/ WriteClient. but not wiring to any caller. 
do not wire it in w/ metadata table. should be to landable though. 
3. Adding write config. engine based enablement, table version based 
enablement.  
4. NBCC in mdt. 
5. MDT writer apis and implementation. FILES and RLI only w/o actual 
integration. 
later write should support all other partitions. Again, no wiring yet. but when 
wiring is done, it should work. 
6. Support upsert prepped partial w/ Write client. check how do we get the list 
of file slices w/o triggering the dag. 
we should try and support wc.upsert(batch1, t1), wc.upsert(batch2, t1); 
7. MDT writer instance lifecycle management. 
8. Sort out rollbacks in mdt. support both legacy and streaming writes. 
9. Write Status, LeanWriteStatus etc. 
10. Enabling streaming writes. + wiring MDT w/ new WC apis. 
startCommit, upsert, commit just for Ingestion writer. 
+ Row writer integration. 
11. Add support for compaction and log compaction. 
12. Add support clustering. 
13. WriteStatusHandler callback optimization. 

w/ this, we should be able to land the patch in its entirety w/o any issues. 

14. Col stats support. standalone. write handle can generate stats w/o any 
issues. w/o actual integration. 
15. Integrate Col stats w/ streaming writes. 
16. Integrate PSI w/ streaming writes. 
17. Integrate SI w/ sreaming writes. 
18. Integrate bloom filter w/ streaming writes. 

 

> DT MDT DAG rewrite for efficient streaming writes: Impl 
> --------------------------------------------------------
>
>                 Key: HUDI-9281
>                 URL: https://issues.apache.org/jira/browse/HUDI-9281
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to