dixingxing0 edited a comment on pull request #2109:
URL: https://github.com/apache/iceberg/pull/2109#issuecomment-762679911


   > @dixingxing0, in our implementation, we store watermarks in snapshot 
summary metadata. I think that's a more appropriate place for it because it is 
metadata about the snapshot that is produced. We also use a watermark per 
writer because we write in 3 different AWS regions. So I think it would make 
sense to be able to name each watermark, possibly with a default if you choose 
not to name it.
   
   Thanks @rdblue, agree with you.
   Since we will store watermark in snapshot summary metadata, we should also 
consider rewrite action, currently rewrite action will lost the extra 
properties in summary metadata, e.g. 
`flink.max-committed-checkpoint-id`,`flink.job-id`. 
   I think we should copy the extra properties from current snapshot to 
`RewriteFiles`(the new snapshot), but i am not sure if this would work as 
expected, after all flink will continuous produce new snapshot, i'm not sure 
how iceberg will resolve the confict, i'll do some tests first.
   
   About to name watermark, i think we can introduce an new confiuration 
`flink.watermark-name`:
   ```java
   // user specified configuration
   flink.store-watermark=false      // as default
   flink.watermark-name=default  // as default
   
   // written by flink file committer
   flink.watermark-for-default=the-watermark  // use flink.watermark-name as 
suffix
   ```
   
   @rdblue  what do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to