aiborodin commented on code in PR #14312:
URL: https://github.com/apache/iceberg/pull/14312#discussion_r2444125269
##########
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicWriteResultAggregator.java:
##########
@@ -125,26 +126,42 @@ public void prepareSnapshotPreBarrier(long checkpointId)
throws IOException {
}
/**
- * Write all the completed data files to a newly created manifest file and
return the manifest's
+ * Write all the completed data files to a newly created manifest files and
return the manifests'
* avro serialized bytes.
*/
@VisibleForTesting
- byte[] writeToManifest(
- WriteTarget key, Collection<DynamicWriteResult> writeResults, long
checkpointId)
+ byte[][] writeToManifests(
+ String tableName, Collection<WriteResult> writeResults, long
checkpointId)
Review Comment:
> I'm leaning towards option 1, because I'm a bit skeptical about other
serialization methods, and I think we will need longer time to agree on a way
to move forward.
Sure, I added a new version and a method to deserialize the previous
committable version to achieve this.
> One argument against it is that the multiple manifest serialization
doesn't add too much performance gain for us.
We get performance gain from this change by not having to write a new
manifest for each unique `schemaId`, `upsertMode`, `equalityFields` in the
`WriteTarget`, but only for unique `specIds`.
> In the past, we have always made sure, that users can upgrade their job to
a newer Iceberg version without dropping the state. This is important for long
running jobs, where in-place upgrade is critical.
In my opinion, this approach makes sense for battle-tested and
well-established APIs. The `DynamicIcebergSink` API was only released a few
weeks ago and would not have as many users relying on it. I think quick
iteration and resolution of issues are more beneficial at the early stages of
this code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]