dixingxing0 edited a comment on pull request #2109: URL: https://github.com/apache/iceberg/pull/2109#issuecomment-762679911
> @dixingxing0, in our implementation, we store watermarks in snapshot summary metadata. I think that's a more appropriate place for it because it is metadata about the snapshot that is produced. We also use a watermark per writer because we write in 3 different AWS regions. So I think it would make sense to be able to name each watermark, possibly with a default if you choose not to name it. Thanks @rdblue, agree with you. Since we will store watermark in snapshot summary metadata, we should also consider rewrite action, currently rewrite action will lost the extra properties in summary metadata, e.g. `flink.max-committed-checkpoint-id`,`flink.job-id`. I think we should copy the extra properties from current snapshot to `RewriteFiles`(the new snapshot), but i am not sure if this would work as expected, after all flink will continuous produce new snapshot, i'm not sure how iceberg will resolve the confict, i'll do some tests first. About to name watermark, i think we can introduce an new confiuration `flink.watermark-name`: ```java // user specified configuration flink.store-watermark=false // as default flink.watermark-name=default // as default // written by flink file committer flink.watermark-for-default=the-watermark // use flink.watermark-name as suffix ``` @rdblue what do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
