Re: [PR] Flink: Refactor WriteResult aggregation in DynamicIcebergSink [iceberg]

via GitHub Mon, 20 Oct 2025 01:15:34 -0700


aiborodin commented on code in PR #14312:
URL: https://github.com/apache/iceberg/pull/14312#discussion_r2444125269



##########
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicWriteResultAggregator.java:
##########
@@ -125,26 +126,42 @@ public void prepareSnapshotPreBarrier(long checkpointId) 
throws IOException {
   }
 
   /**
-   * Write all the completed data files to a newly created manifest file and 
return the manifest's
+   * Write all the completed data files to a newly created manifest files and 
return the manifests'
    * avro serialized bytes.
    */
   @VisibleForTesting
-  byte[] writeToManifest(
-      WriteTarget key, Collection<DynamicWriteResult> writeResults, long 
checkpointId)
+  byte[][] writeToManifests(
+      String tableName, Collection<WriteResult> writeResults, long 
checkpointId)

Review Comment:
   > I'm leaning towards option 1, because I'm a bit skeptical about other 
serialization methods, and I think we will need longer time to agree on a way 
to move forward.
   
   Sure, I added a new version and a method to deserialize the previous 
committable version.
   
   > One argument against it is that the multiple manifest serialization 
doesn't add too much performance gain for us.
   
   We get performance gain from this change by not having to write a new 
manifest for each unique `schemaId`, `upsertMode`, `equalityFields` in the 
`WriteTarget`, but only for unique `specIds`.
   
   > In the past, we have always made sure, that users can upgrade their job to 
a newer Iceberg version without dropping the state. This is important for long 
running jobs, where in-place upgrade is critical.
   
   In my opinion, this approach makes sense for battle-tested and 
well-established APIs. The `DynamicIcebergSink` API was only released a few 
weeks ago and would not have as many users relying on it. I think quick 
iteration and resolution of issues are more beneficial at the early stages of 
this code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Flink: Refactor WriteResult aggregation in DynamicIcebergSink [iceberg]

Reply via email to