Re: [I] [SUPPORT] Solution for synchronizing the entire database table in flink [hudi]

via GitHub Wed, 01 Nov 2023 17:52:32 -0700


bajiaolong commented on issue #9965:
URL: https://github.com/apache/hudi/issues/9965#issuecomment-1789890659


   > How many tables are there in your database, it is feasible if you have 
just handful of tables like 20, then you can consume the Kafka topic and 
partition the stream by table name, for each partitioned stream, you can 
pipeline with hudi sink, you need to write some DataStream pipelines manually, 
take 
https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/HoodiePipeline.java
 for a single table example.
   
   1. Why is the data in the table limited to 20 and what is the reason.
   
   2. Now the data of all tables in my library are synchronized to one table. 
Partition is done with the schema and table name of the database. When reading 
downstream, I filter the table name through stream. However, this method is 
very time-consuming. Is there a stream read operation that only reads fixed 
partitions, so that I can get a single table
   
   3.  Do you have any suggestions for the second one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT] Solution for synchronizing the entire database table in flink [hudi]

Reply via email to