neuyilan commented on PR #4844:
URL: https://github.com/apache/paimon/pull/4844#issuecomment-2579130192

   
   
![image](https://github.com/user-attachments/assets/ecaf5c41-b665-402c-a764-ac7d428d1221)
   
   Hi, @JingsongLi , thanks again for advice, and I have refactored to the 
following Flink batch job, please review it again. Thanks.
   1. The first stage is responsible for pick the tables need cloned.If the 
database parameter is not passed, then all tables of all databases will be 
cloned.If the table parameter is not passed, then all tables of the database 
will be cloned. (not changed, the same as the original design).
   2. The second stage just pick the schema files and copy it to the target 
path, the schema file contains Snapshot, Schema, ManifestList and IndexFile.
   3. The thrid stage just pick the mainifest file in single parallelism.
   4. The fourth stage mainly involves copying or rewriting the manifest file 
in distributed parallelism. If it is an external path, rewrite it; otherwise, 
copy it.
   5. The fifth stage is picking all  the data files in single parallelism. 
(data file contains Datafile and ChangeLog).
   6. Shuffle the data file by the filename.
   7. The sixth stage is copy the data files in distributed parallelism. 
   8. Shuffle by the target's table name to next stage.
   9. The seventh stage is recreate the snapshot hint file. (not changed, the 
same as the original design).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to