liubo1022126 commented on pull request #2109:
URL: https://github.com/apache/iceberg/pull/2109#issuecomment-898322236


   > We actually implemented this in a slightly different way of calculating 
the watermark. Instead of using Flink watermark, we add some additional 
metadata (min, max, sum, count) per DataFile for the timestamp column. In the 
committer, we use the min of min to decide the watermark value. We never 
regress the watermark value. Those metadata can also help us calculate metrics 
for ingestion latency (commit time - event/Kafka time): like min, max, avg.
   > 
   > Just to share, by no means that I am suggesting changing the approach in 
this PR. It is perfectly good.
   
   thx @stevenzwu @rdblue, that sounds great! We also need to embed the iceberg 
table, which is regarded as real-time table, into our workflow. Is there any 
doc or patch for your implementation?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to