0xffmeta opened a new issue, #4552:
URL: https://github.com/apache/iceberg/issues/4552

   I'm trying to use the `upsert` mode in FlinkSink to stream write the 
records, but I found that the data files were duplicated, similar for the 
manifest files - not sure if this is expected for `upsert` FlinkSink.
   <img width="1195" alt="image" 
src="https://user-images.githubusercontent.com/98149057/163225844-ff72c816-641c-48ca-b3cd-70c19d785708.png";>
   <img width="1071" alt="image" 
src="https://user-images.githubusercontent.com/98149057/163226134-cbd0d835-cae2-4a03-a68e-2c722dc978e3.png";>
   
   I can see from the writer, it will first delete the row and then write the 
row. Not sure if there is any way to optimize this.
   
https://github.com/apache/iceberg/blob/3f5230d312c5b0630681a18da0f30439ba7f6982/flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/sink/BaseDeltaTaskWriter.java#L83
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to