liubo1022126 commented on pull request #3093:
URL: https://github.com/apache/iceberg/pull/3093#issuecomment-916904105


   > @liubo1022126 Instead of carrying over watermark info into `WriteResult`, 
I am wondering if we should have `IcebergFilesCommitter` override this method 
from `AbstractStreamOperator` to intercept watermark advancement.
   > 
   > ```
   >     public void processWatermark(Watermark mark) throws Exception 
   > ```
   > 
   > Also maybe revert unrelated formatting change,
   
   thanks @stevenzwu , I have also considered using flink's watermark to design 
iceberg's data advancement information, but why I rejected it later, mainly the 
following points: 
   
   1. what is flink watermark: Flink watermark is mainly used to solve the 
out-of-order problem of delayed data arrival, out-of-order data is very common 
in real-time streams, and reordering is necessary.
   2. what is iceberg watermark: In iceberg, watermark is used when writing 
data, save the time progress of writing data, it has nothing to do with the 
reordering of out-of-order data. It is different from the design intention of 
flink watermark.
   
   So, Although we can use the ability of flink watermark to implement iceberg 
watermark, but if we need to support the reordering of out-of-order data in 
iceberg in the future, flink iceberg and iceberg watermark will cause confusion.
   So, For the expression of [watermark] in iceberg watermark, I am also 
suspicious, It might be called data advancement better?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to