liubo1022126 commented on pull request #3093: URL: https://github.com/apache/iceberg/pull/3093#issuecomment-916904105
> @liubo1022126 Instead of carrying over watermark info into `WriteResult`, I am wondering if we should have `IcebergFilesCommitter` override this method from `AbstractStreamOperator` to intercept watermark advancement. > > ``` > public void processWatermark(Watermark mark) throws Exception > ``` > > Also maybe revert unrelated formatting change, thanks @stevenzwu , I have also considered using flink's watermark to design iceberg's data advancement information, but why I rejected it later, mainly the following points: 1. what is flink watermark: Flink watermark is mainly used to solve the out-of-order problem of delayed data arrival, out-of-order data is very common in real-time streams, and reordering is necessary. 2. what is iceberg watermark: In iceberg, watermark is used when writing data, save the time progress of writing data, it has nothing to do with the reordering of out-of-order data. It is different from the design intention of flink watermark. So, Although we can use the ability of flink watermark to implement iceberg watermark, but if we need to support the reordering of out-of-order data in iceberg in the future, flink iceberg and iceberg watermark will cause confusion. So, For the expression of [watermark] in iceberg watermark, I am also suspicious, It might be called data advancement better? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
