qidian99 opened a new issue, #1563:
URL: https://github.com/apache/incubator-paimon/issues/1563

   ### Search before asking
   
   - [X] I searched in the 
[issues](https://github.com/apache/incubator-paimon/issues) and found nothing 
similar.
   
   
   ### Paimon version
   
   0.5-SNAPSHOT
   
   ### Compute Engine
   
   Flink
   
   ### Minimal reproduce step
   
   When restarting a job from a savepoint in the Apache Flink CDC (Change Data 
Capture) engine, it is crucial to consider the impact of topology changes and 
the proper handling of hash UIDs for state recovery.
   
   For instance, let's consider a CDC job that captures changes from a source 
database and writes them to a target sink. Suppose the initial savepoint is 
taken when the job processes tables A and B from the source database. Now, if a 
new table C is added to the source database and the job is restarted from the 
savepoint, the absence of hash UIDs for the operators involved in processing 
table C can result in state unavailability.
   
   To address this issue effectively, it is necessary to determine which states 
should be ignored or skipped when new tables are introduced. By properly 
configuring the hash UIDs for all affected operators, the CDC engine can ensure 
that the state associated with each operator is correctly identified and 
recovered during the restart process. This ensures the reliability and 
completeness of the CDC job, even when there are changes in the job's topology.
   
   Please note that this issue specifically pertains to the Apache Flink CDC 
engine and its handling of job recovery from savepoints when there are changes 
in the job's topology.
   
   ### What doesn't meet your expectations?
   
   Now Paimon does not assign UIDs to tables. Note that UIDs should relate to 
both database and tables, or there might be conflict when there are two tables 
with same name but in different database.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to