Re: [PR] [clone]fix the clone action when we introduced external path [paimon]

via GitHub Tue, 07 Jan 2025 08:13:40 -0800


neuyilan commented on PR #4844:
URL: https://github.com/apache/paimon/pull/4844#issuecomment-2575686738

![image](https://github.com/user-attachments/assets/2f4943d7-a0df-479e-a92f-a6e771c71460)

Hi, Jingsong, according to the original design[1] and the above discussion,
I plan to refactore to the following Flink batch job.
1. The first stage is responsible for pick the tables need cloned.If the
database parameter is not passed, then all tables of all databases will be
cloned.If the table parameter is not passed, then all tables of the database
will be cloned. (not changed, the same as the original design).
2. The second stage pick related files(Snapshot, Schema, ManifestList,
Manifest, Datafile, ChangeLog, IndexFile) of the snapshot in source table.(not
changed, the same as the original design).
3. The thrid stage is only copy the schema files to the target path. the
schema files contains: Snapshot, Schema, ManifestList and IndexFile.
4. The fourth stage mainly involves copying or rewriting the manifest file
in distributed parallelism. If it is an external path, rewrite it; otherwise,
copy it.
5. Shuffle the data file by the filename.(data file contains Datafile and
ChangeLog).
6. The fifth stage is copy the data files in distributed parallelism.
7. Shuffle by the target's table name to next stage.
8. The sixth stage is recreate the snapshot hint file. (not changed, the
same as the original design).

Please help confirm if this refactoring is appropriate, Thanks.

[1]
https://cwiki.apache.org/confluence/display/PAIMON/PIP-18%3A+Introduce+clone+Action+and+Procedure

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [clone]fix the clone action when we introduced external path [paimon]

Reply via email to