liubo1022126 commented on pull request #3130:
URL: https://github.com/apache/iceberg/pull/3130#issuecomment-921432832


   There is a problem with this pr, When the write job is started from a 
non-state, It will get the watermark in the current snapshot, if the operation 
of the current snapshot is not append operation(delete, rewrite), it can't get 
watermark in the current snapshot, because the advancement information of 
watermark is only implemented in streaming write.
   
   Now I can think of two solutions:
   1. Add watermark transfer support for various operations, like delete, 
rewrite and so on: But this idea have a problem, In the scenario of streaming 
write while file rewrite, It is common for new data to be written in the 
process of rewrite start to end. that is to say, when snapshot s1 has watermark 
w1 when file rewrite begin, then streaming write commit new snapshot s2 has 
watermark w2, then file rewrite complete for snapshot s3 with watermark w3, to 
calculate w3, in addition to passing w1 to w3, we must also perform additional 
calculations on w2. It looks very complicated and hard to understand.
   2. Keep the current implementation in this pr, only record watermark in 
append operation: When we need to get the watermark of the current table, we 
will backtrack the table snapshot until we get the watermark value in the most 
recent append operation.
   
   I prefer solution 2. what about you think? @stevenzwu @rdblue @openinx 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to