kbendick commented on pull request #3039:
URL: https://github.com/apache/iceberg/pull/3039#issuecomment-914870892


   Now that I’m thinking about it. If we wanted to read all of the data that 
was committed at and updated as of some time X, then we’d want to grab all 
changed data from time X (inclusive).
   
   Could we get at least an approximation (with possible duplicates) by 
checking the metadata from the commit at time stamp X?  Such as updated files, 
things like that?
   
   I feel if users don’t mind possibly getting extra rows, they might prefer / 
also want to get changed data as of a given time stamp.
   
   We’d still need this PR, but this would also line up more with how delta 
currently handles append only streams - users can pick to get all rows 
committed as of a time stamp and then _possibly_ get duplicates because the 
file was updated (even if the row is not).
   
   The only part of this PR this would affect is maybe the config’s name (as 
it’s a separate feature), but it could be nice to have. I can envision use 
cases for it, where you’d prefer duplicates to losing data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to