dixingxing0 opened a new issue #2108:
URL: https://github.com/apache/iceberg/issues/2108


   We need an flag to measure the progress of data writing in some case, i 
think it‘s a reasonable way to store `watermark` as Iceberg table's property.
   
   One of our scenario is that Flink writes data into Iceberg table in 
real-time, then we use this Iceberg table to support batch computation,  so we 
need a flag to evaluate whether it's partition data is completed.
   
   For example, job1 is scheduled for each hour, at 2021-01-19 02:01:00, job1 
begin to check whether `iceberg_table1`'s partition(20210119/01) is finished ( 
Flink writing data into `iceberg_table1` in realtime) , when `watermark` in 
`iceberg_table1`'s properties is newer than 2020-01-19 02:05:00 (out-of-order 
for 5 minutes), we treat partition(20210119/01) as completed and we can safely 
execute Hive or Flink sql to do batch computation. (basically is `insert into 
`table2` select xx from `iceberg_table1` ...`)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to