[
https://issues.apache.org/jira/browse/HUDI-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950620#comment-17950620
]
sivabalan narayanan edited comment on HUDI-9281 at 5/10/25 12:16 AM:
---------------------------------------------------------------------
We have a end to end working [https://github.com/apache/hudi/pull/13236]
But stat splitting it up w/ multiple smaller patches.
1. Metadata Upsert Partitioner. Siva.
[https://github.com/apache/hudi/pull/13001]
2. Metadata Spark CommitActionExecutor. Siva.
[https://github.com/apache/hudi/pull/13005]
+ upsertPrepped w/ additional argument in HoodieTable
+ upsertPrepped w/ WriteClient. but not wiring to any caller.
do not wire it in w/ metadata table. should be to landable though.
3. Adding write config. engine based enablement, table version based
enablement. Siva.
4. NBCC in mdt. Siva.
5. MDT writer apis and implementation. FILES and RLI only w/o actual
integration.
later write should support all other partitions. Again, no wiring yet. but when
wiring is done, it should work. Lokesh.
6. Support upsert prepped partial w/ Write client. check how do we get the list
of file slices w/o triggering the dag.
we should try and support wc.upsert(batch1, t1), wc.upsert(batch2, t1); Siva.
7. MDT writer instance lifecycle management. Lokesh.
8. Sort out rollbacks in mdt. support both legacy and streaming writes. Lokesh.
9. Write Status, LeanWriteStatus etc. Lokesh.
10. Enabling streaming writes. + wiring MDT w/ new WC apis.
startCommit, upsert, commit just for Ingestion writer.
+ Row writer integration. Lokesh.
11. Add support for compaction and log compaction. Siva.
12. Add support clustering. Lokesh
13. WriteStatusHandler callback optimization. Lokesh.
w/ this, we should be able to land the patch in its entirety w/o any issues.
14. Col stats support. standalone. write handle can generate stats w/o any
issues. w/o actual integration. Lokesh.
15. Integrate Col stats w/ streaming writes. Siva.
16. Integrate PSI w/ streaming writes. Siva.
17. Integrate SI w/ sreaming writes. Lokesh.
18. Integrate bloom filter w/ streaming writes. Lokesh
was (Author: shivnarayan):
We have a end to end working [https://github.com/apache/hudi/pull/13236]
But stat splitting it up w/ multiple smaller patches.
1. Metadata Upsert Partitioner [https://github.com/apache/hudi/pull/13001]
2. Metadata Spark CommitActionExecutor.
[https://github.com/apache/hudi/pull/13005]
+ upsertPrepped w/ additional argument in HoodieTable
+ upsertPrepped w/ WriteClient. but not wiring to any caller.
do not wire it in w/ metadata table. should be to landable though.
3. Adding write config. engine based enablement, table version based
enablement.
4. NBCC in mdt.
5. MDT writer apis and implementation. FILES and RLI only w/o actual
integration.
later write should support all other partitions. Again, no wiring yet. but when
wiring is done, it should work.
6. Support upsert prepped partial w/ Write client. check how do we get the list
of file slices w/o triggering the dag.
we should try and support wc.upsert(batch1, t1), wc.upsert(batch2, t1);
7. MDT writer instance lifecycle management.
8. Sort out rollbacks in mdt. support both legacy and streaming writes.
9. Write Status, LeanWriteStatus etc.
10. Enabling streaming writes. + wiring MDT w/ new WC apis.
startCommit, upsert, commit just for Ingestion writer.
+ Row writer integration.
11. Add support for compaction and log compaction.
12. Add support clustering.
13. WriteStatusHandler callback optimization.
w/ this, we should be able to land the patch in its entirety w/o any issues.
14. Col stats support. standalone. write handle can generate stats w/o any
issues. w/o actual integration.
15. Integrate Col stats w/ streaming writes.
16. Integrate PSI w/ streaming writes.
17. Integrate SI w/ sreaming writes.
18. Integrate bloom filter w/ streaming writes.
> DT MDT DAG rewrite for efficient streaming writes: Impl
> --------------------------------------------------------
>
> Key: HUDI-9281
> URL: https://issues.apache.org/jira/browse/HUDI-9281
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.1.0
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)