openinx commented on a change in pull request #1477:
URL: https://github.com/apache/iceberg/pull/1477#discussion_r505250615



##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
##########
@@ -178,14 +185,20 @@ public void notifyCheckpointComplete(long checkpointId) 
throws Exception {
     }
   }
 
-  private void commitUpToCheckpoint(NavigableMap<Long, List<DataFile>> 
dataFilesMap,
+  private void commitUpToCheckpoint(NavigableMap<Long, Byte[]> manifestsMap,
                                     String newFlinkJobId,
-                                    long checkpointId) {
-    NavigableMap<Long, List<DataFile>> pendingFileMap = 
dataFilesMap.headMap(checkpointId, true);
+                                    long checkpointId) throws IOException {
+    NavigableMap<Long, Byte[]> pendingManifestMap = 
manifestsMap.headMap(checkpointId, true);
+
+    List<ManifestFile> manifestFiles = Lists.newArrayList();
+    for (Byte[] manifestData : pendingManifestMap.values()) {
+      ManifestFile manifestFile = 
ManifestFiles.decode(ArrayUtils.toPrimitive(manifestData));
+      manifestFiles.add(manifestFile);
+    }
 
     List<DataFile> pendingDataFiles = Lists.newArrayList();
-    for (List<DataFile> dataFiles : pendingFileMap.values()) {
-      pendingDataFiles.addAll(dataFiles);
+    for (ManifestFile manifestFile : manifestFiles) {
+      pendingDataFiles.addAll(FlinkManifest.read(manifestFile, table.io()));

Review comment:
       I think you concern is :  the manifests content is so large in an 
extreme case  that it will take much resources to copy those entries ?  If we 
really want to save the copy, we will need : 
   1.   Set the `compatibility.snapshot-id-inheritance.enabled` to be `true`,  
its default value is `false` and in that case it will still copy the contents 
to assign the correct snapshot id.  
https://github.com/apache/iceberg/blob/dc661d417098265de61383650e784d1b39a57209/core/src/main/java/org/apache/iceberg/FastAppend.java#L105
 . 
   2.  Add the necessary `appendManifest` interface and implementation for 
`ReplacePartition`. 
   
   I agree that it's nice to have that improvement, but as the release time 
point is coming, I'd rather not to block this the next release 0.10.0, this 
should not break the user's experience or compatibility.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to