neuyilan commented on PR #4844: URL: https://github.com/apache/paimon/pull/4844#issuecomment-2575042180
> I feel that the current clone process needs to be refactored: > > 1. Single parallelism to query all manifest files and copy the manifest list file. > 2. Read the manifest in a distributed parallelism, determine whether to rewrite it (with or without an external path), and complete the copy or rewrite of the manifest. > 3. shuffle by data file name. > 4. Distributed copy data files. > > This hierarchical approach to copying is the correct solution. Thanks for your advice, I will try do this best. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
