huyuanfeng2018 opened a new issue, #5955:
URL: https://github.com/apache/paimon/issues/5955

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Motivation
   
   When using CDC synchronous database operations on a database containing a 
large number of tables (hundreds of thousands), the "catalog. listTables 
(database)" operation in "SyncDataActionBase" may take a long time to complete, 
causing the entire synchronization job to start blocking for a long time. This 
will significantly affect the duration of CDC synchronization task initiation.
   
   
   
   
   ### Solution
   
   ### Current Behavior
   The current implementation calls `catalog.listTables(database)` during 
initialization and maintains a `createdTables` set to track table creation 
status. This approach:
   1. Blocks the entire sync process while listing all tables
   2. Consumes unnecessary memory to maintain the `createdTables` set
   3. Performs redundant operations when tables are created lazily
   
   ### Expected Behavior
   The sync process should:
   1. Avoid blocking on `listTables` operation during initialization
   2. Create tables lazily when needed without maintaining a global 
`createdTables` set
   3. Improve overall performance for databases with large numbers of tables
   
   ### Solution
   Optimize the table creation logic by:
   1. Removing the upfront `listTables` call in `SyncDatabaseActionBase`
   2. Eliminating the `createdTables` set from 
`RichCdcMultiplexRecordEventParser`
   3. Implementing lazy table creation in 
`CdcDynamicTableParsingProcessFunction` with existence checks
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to