bajiaolong commented on issue #9965: URL: https://github.com/apache/hudi/issues/9965#issuecomment-1789890659
> How many tables are there in your database, it is feasible if you have just handful of tables like 20, then you can consume the Kafka topic and partition the stream by table name, for each partitioned stream, you can pipeline with hudi sink, you need to write some DataStream pipelines manually, take https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/HoodiePipeline.java for a single table example. 1. Why is the data in the table limited to 20 and what is the reason. 2. Now the data of all tables in my library are synchronized to one table. Partition is done with the schema and table name of the database. When reading downstream, I filter the table name through stream. However, this method is very time-consuming. Is there a stream read operation that only reads fixed partitions, so that I can get a single table 3. Do you have any suggestions for the second one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
