openinx commented on a change in pull request #1477:
URL: https://github.com/apache/iceberg/pull/1477#discussion_r505250615
##########
File path:
flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
##########
@@ -178,14 +185,20 @@ public void notifyCheckpointComplete(long checkpointId)
throws Exception {
}
}
- private void commitUpToCheckpoint(NavigableMap<Long, List<DataFile>>
dataFilesMap,
+ private void commitUpToCheckpoint(NavigableMap<Long, Byte[]> manifestsMap,
String newFlinkJobId,
- long checkpointId) {
- NavigableMap<Long, List<DataFile>> pendingFileMap =
dataFilesMap.headMap(checkpointId, true);
+ long checkpointId) throws IOException {
+ NavigableMap<Long, Byte[]> pendingManifestMap =
manifestsMap.headMap(checkpointId, true);
+
+ List<ManifestFile> manifestFiles = Lists.newArrayList();
+ for (Byte[] manifestData : pendingManifestMap.values()) {
+ ManifestFile manifestFile =
ManifestFiles.decode(ArrayUtils.toPrimitive(manifestData));
+ manifestFiles.add(manifestFile);
+ }
List<DataFile> pendingDataFiles = Lists.newArrayList();
- for (List<DataFile> dataFiles : pendingFileMap.values()) {
- pendingDataFiles.addAll(dataFiles);
+ for (ManifestFile manifestFile : manifestFiles) {
+ pendingDataFiles.addAll(FlinkManifest.read(manifestFile, table.io()));
Review comment:
I think you concern is : the manifests content is so large in an
extreme case that it will take much resources to copy those entries ? If we
really want to save the copy, we will need :
1. Set the `compatibility.snapshot-id-inheritance.enabled` to be `true`,
its default value is `false` and in that case it will still copy the contents
to assign the correct snapshot id.
https://github.com/apache/iceberg/blob/dc661d417098265de61383650e784d1b39a57209/core/src/main/java/org/apache/iceberg/FastAppend.java#L105
.
2. Add the necessary `appendManifest` interface and implementation for
`ReplacePartition`.
I agree that it's nice to have that improvement, but as the release time
point is coming, I'd rather not to block this the next release 0.10.0, this
should not break the user's experience or compatibility.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]