kbendick commented on pull request #3039: URL: https://github.com/apache/iceberg/pull/3039#issuecomment-914870892
Now that I’m thinking about it. If we wanted to read all of the data that was committed at and updated as of some time X, then we’d want to grab all changed data from time X (inclusive). Could we get at least an approximation (with possible duplicates) by checking the metadata from the commit at time stamp X? Such as updated files, things like that? I feel if users don’t mind possibly getting extra rows, they might prefer / also want to get changed data as of a given time stamp. We’d still need this PR, but this would also line up more with how delta currently handles append only streams - users can pick to get all rows committed as of a time stamp and then _possibly_ get duplicates because the file was updated (even if the row is not). The only part of this PR this would affect is maybe the config’s name (as it’s a separate feature), but it could be nice to have. I can envision use cases for it, where you’d prefer duplicates to losing data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
