ziudu opened a new issue #3344:
URL: https://github.com/apache/hudi/issues/3344


   Dear,
   
   We're new to Hudi and we would like to know which the best way is to ingest 
a large number of tables. For example, in the production environment, we have 
about 70 mysql databases with >1000 tables in total. We'd prefer them all to be 
ingested with continuous mode in the spirit of data lake. 
   
    - option 1, deltastreamer : each table requires a single deltastreamer, so 
it consumes too much resource. (Is it possible to submit multiple deltastreamer 
into 1 spark context?) 
    - option 2, multitabledeltastreamer: it doesn't support MOR yet, which is 
our preferred format. 
    - option 3, write our own data ingestion logic with java-client, but it 
takes some time.
   
   Do you have any suggestions?
   Thanks,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to