stevenzwu commented on pull request #2109: URL: https://github.com/apache/iceberg/pull/2109#issuecomment-769542927
Using Flink watemark is definitely a very reasonable approach. We actually implemented this in a slightly different way of calculating the watermark. Instead of using Flink watermark, we add some additional metadata (min, max, sum, count) per DataFile for the timestamp column. In the committer, we use the min of min to decide the watermark value. We never regress the watermark value. Those metadata can also help us calculate metrics for ingestion latency (commit time - event/Kafka time): like min, max, avg. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
