[
https://issues.apache.org/jira/browse/HUDI-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-2318:
--------------------------------------
Description:
Currently multi-table deltastreamer supports COW and only for run once mode. We
need to enhance lot more and make it usable for all different scenarios.
There are asks from the community on this. Typical use-cases:
I have 1000+ tables and I wish to ingest all of them into hudi efficiently. I
don't want to use 1000+ delta streamer instances as I have to allot resources
for every deltastreamer instance.
Requirements
* Add MOR support to Multi-table deltastreamer
* Add continuous mode support to multi-table ds.
* Add support to sync concurrently across diff tables. As of now, each table
is synced serially which may not work out well for 1000+ tables. And we may not
want to sync all 1000+ tables concurrently. But using a thread-pool, we can
achieve some level of concurrency.
** Check out [https://github.com/apache/hudi/issues/2175] to ingest to
multiple hudi tables using spark structured streaming. We can also try to see
if we can add it as utility.
was:
Currently multi-table deltastreamer supports COW and only for run once mode. We
need to enhance lot more and make it usable for all different scenarios.
There are asks from the community on this. Typical use-cases:
I have 1000+ tables and I wish to ingest all of them into hudi efficiently. I
don't want to use 1000+ delta streamer instances as I have to allot resources
for every deltastreamer instance.
Requirements
* Add MOR support to Multi-table deltastreamer
* Add continuous mode support to multi-table ds.
* Add support to sync concurrently across diff tables. As of now, each table
is synced serially which may not work out well for 1000+ tables. And we may not
want to sync all 1000+ tables concurrently. But using a thread-pool, we can
achieve some level of concurrency.
> Enhance and stablize multi-table deltastreamer
> ----------------------------------------------
>
> Key: HUDI-2318
> URL: https://issues.apache.org/jira/browse/HUDI-2318
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Utilities
> Reporter: sivabalan narayanan
> Priority: Major
>
> Currently multi-table deltastreamer supports COW and only for run once mode.
> We need to enhance lot more and make it usable for all different scenarios.
>
> There are asks from the community on this. Typical use-cases:
> I have 1000+ tables and I wish to ingest all of them into hudi efficiently. I
> don't want to use 1000+ delta streamer instances as I have to allot resources
> for every deltastreamer instance.
>
> Requirements
> * Add MOR support to Multi-table deltastreamer
> * Add continuous mode support to multi-table ds.
> * Add support to sync concurrently across diff tables. As of now, each
> table is synced serially which may not work out well for 1000+ tables. And we
> may not want to sync all 1000+ tables concurrently. But using a thread-pool,
> we can achieve some level of concurrency.
> ** Check out [https://github.com/apache/hudi/issues/2175] to ingest to
> multiple hudi tables using spark structured streaming. We can also try to see
> if we can add it as utility.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)