neuyilan commented on code in PR #4844:
URL: https://github.com/apache/paimon/pull/4844#discussion_r1921091780


##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/clone/CloneFileInfo.java:
##########
@@ -18,29 +18,35 @@
 
 package org.apache.paimon.flink.clone;
 
+import javax.annotation.Nullable;
+
 /** The information of copy file. */
 public class CloneFileInfo {
-
-    private final String sourceFilePath;
-    private final String filePathExcludeTableRoot;
+    @Nullable private final String sourceFilePath;
+    @Nullable private final String filePathExcludeTableRoot;
     private final String sourceIdentifier;
     private final String targetIdentifier;
+    private final long snapshotId;

Review Comment:
   I think we can not remove this filed. because if do not provided the 
snapshotId. when we do job in CopyManifestFilesOperator, we can not just pick 
the data files in manifest file, because the data files maybe delete in another 
mainifest file. 
   
   for example, in snapshot1, we add one data file `data-file1.parquet` in 
`manifest-file1` ; in snapshot 2, we add one data file `data-file2.parquet` and 
**delete** `data-file1.parquet` in `manifest-file2`. And these two manifest 
files were processed in two separate tasks, when processing `manifest-file1` 
and copy `data-file1.parquet`,  the job will fail.
   
   So we can not just pick the data files in manifest file.  I think we still 
need the snapshot id. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to