ziudu opened a new issue #3344:
URL: https://github.com/apache/hudi/issues/3344
Dear,
We're new to Hudi and we would like to know which the best way is to ingest
a large number of tables. For example, in the production environment, we have
about 70 mysql databases with >1000 tables in total. We'd prefer them all to be
ingested with continuous mode in the spirit of data lake.
- option 1, deltastreamer : each table requires a single deltastreamer, so
it consumes too much resource. (Is it possible to submit multiple deltastreamer
into 1 spark context?)
- option 2, multitabledeltastreamer: it doesn't support MOR yet, which is
our preferred format.
- option 3, write our own data ingestion logic with java-client, but it
takes some time.
Do you have any suggestions?
Thanks,
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]