neuyilan commented on code in PR #4844:
URL: https://github.com/apache/paimon/pull/4844#discussion_r1921091780
##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/clone/CloneFileInfo.java:
##########
@@ -18,29 +18,35 @@
package org.apache.paimon.flink.clone;
+import javax.annotation.Nullable;
+
/** The information of copy file. */
public class CloneFileInfo {
-
- private final String sourceFilePath;
- private final String filePathExcludeTableRoot;
+ @Nullable private final String sourceFilePath;
+ @Nullable private final String filePathExcludeTableRoot;
private final String sourceIdentifier;
private final String targetIdentifier;
+ private final long snapshotId;
Review Comment:
I think we can not remove this filed. because if do not provided the
snapshotId. when we do job in CopyManifestFilesOperator, we can not just pick
the data files in manifest file, because the data files maybe delete in another
mainifest file.
for example, in snapshot1, we add one data file `data-file1.parquet` in
`manifest-file1` ; in snapshot 2, we add one data file `data-file2.parquet` and
**delete** `data-file1.parquet` in `manifest-file2`. And these two manifest
files were processed in two separate tasks, when processing `manifest-file1`
and copy `data-file1.parquet`, the job will fail.
So we can not just pick the data files in manifest file. I think we still
need the snapshot id. What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]