0xffmeta opened a new issue, #4552: URL: https://github.com/apache/iceberg/issues/4552
I'm trying to use the `upsert` mode in FlinkSink to stream write the records, but I found that the data files were duplicated, similar for the manifest files - not sure if this is expected for `upsert` FlinkSink. <img width="1195" alt="image" src="https://user-images.githubusercontent.com/98149057/163225844-ff72c816-641c-48ca-b3cd-70c19d785708.png"> <img width="1071" alt="image" src="https://user-images.githubusercontent.com/98149057/163226134-cbd0d835-cae2-4a03-a68e-2c722dc978e3.png"> I can see from the writer, it will first delete the row and then write the row. Not sure if there is any way to optimize this. https://github.com/apache/iceberg/blob/3f5230d312c5b0630681a18da0f30439ba7f6982/flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/sink/BaseDeltaTaskWriter.java#L83 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
