pvary commented on code in PR #14312:
URL: https://github.com/apache/iceberg/pull/14312#discussion_r2438463916


##########
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicWriteResultAggregator.java:
##########
@@ -125,26 +126,42 @@ public void prepareSnapshotPreBarrier(long checkpointId) 
throws IOException {
   }
 
   /**
-   * Write all the completed data files to a newly created manifest file and 
return the manifest's
+   * Write all the completed data files to a newly created manifest files and 
return the manifests'
    * avro serialized bytes.
    */
   @VisibleForTesting
-  byte[] writeToManifest(
-      WriteTarget key, Collection<DynamicWriteResult> writeResults, long 
checkpointId)
+  byte[][] writeToManifests(
+      String tableName, Collection<WriteResult> writeResults, long 
checkpointId)

Review Comment:
   In the past, we have always made sure, that users can upgrade their job to a 
newer Iceberg version without dropping the state. This is important for long 
running jobs, where in-place upgrade is critical.
   
   I think here we should follow the same pattern. If we change how we store 
data in the state, then we need to make sure, that the old state could be read. 
This is done by versioning the serializer. The groundwork is there, and we need 
to use it.
   
   I understand it is extra work to do if we want to change the serialization 
again, but I'm still not convinced that we have a good solution to that problem.
   
   I see 2 options:
   1. Implement a serialization for the multiple manifests now, and remove it 
if we change it again before the next release 
   2. Block this PR until we agree upon the next serialization solution.
   
   I'm leaning towards option 1, because I'm a bit skeptical about other 
serialization methods, and I think we will need longer time to agree on a way 
to move forward.
   
   One argument against it is that the multiple manifest serialization doesn't 
add too much performance gain for us. It "just" helps by simplifying the 
committer code.
   
   Your thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to