neuyilan commented on PR #4844:
URL: https://github.com/apache/paimon/pull/4844#issuecomment-2575042180

   > I feel that the current clone process needs to be refactored:
   > 
   > 1. Single parallelism to query all manifest files and copy the manifest 
list file.
   > 2. Read the manifest in a distributed parallelism, determine whether to 
rewrite it (with or without an external path), and complete the copy or rewrite 
of the manifest.
   > 3. shuffle by data file name.
   > 4. Distributed copy data files.
   > 
   > This hierarchical approach to copying is the correct solution.
   
   Thanks for your advice, I will try do this best.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to