liubo1022126 commented on pull request #2109: URL: https://github.com/apache/iceberg/pull/2109#issuecomment-898322236
> We actually implemented this in a slightly different way of calculating the watermark. Instead of using Flink watermark, we add some additional metadata (min, max, sum, count) per DataFile for the timestamp column. In the committer, we use the min of min to decide the watermark value. We never regress the watermark value. Those metadata can also help us calculate metrics for ingestion latency (commit time - event/Kafka time): like min, max, avg. > > Just to share, by no means that I am suggesting changing the approach in this PR. It is perfectly good. thx @stevenzwu @rdblue, that sounds great! We also need to embed the iceberg table, which is regarded as real-time table, into our workflow. Is there any doc or patch for your implementation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
