Dam1029 edited a comment on pull request #3358: URL: https://github.com/apache/iceberg/pull/3358#issuecomment-950256939
> Thanks for this contribution @Dam1029! > > As the function you’ve changed in SnapshotUtil is used in other places, we should be sure that it’s being tested for more than just the scenario that your patch covers (though greatly appreciate you adding these tests!). > > I want to draw attention to the fact that a `null` snapshotId is a valid value, so we might want to handle this case elsewhere in the code. As the oldest ancestor could potentially be null (the start of the table). > > Also, in the case of expired snapshots, we should think about what exactly the intended behavior should be. For example, if a user has `failOnDataLoss` as true, then some sort of exception might be expected. Though I definitely agree that an NPE is not a good exception in any case, and we should handle that more explicitly (even if we throw an exception still). > > One other way I think an NPE or an indefinite read-loop could happen would be if somebody does a CreateOrReplace operation (replace table). As that also generates a `null` initial snapshot. > > I can try to add some unit tests around that if you don’t have the time. I think your patch potentially might fix that too (but would need to check). > > I’ll do my best to check out this patch and run some more extensive local tests over it too. It might be sufficient to return null (which is a valid snapshot id) and then reset to the initial offset - though I’m just thinking out loud here as I need to check this out and play around. > > Again, thank you for your contribution! Thank you for you reply. I agree with you that null snapshotId is a valid value. The method oldestSnapshot() is used to return oldest and available snapshot. It only return null if there is no current snapshot in the table, such as empty table. We suppose iceberg table have not executed expired snapshot action. Then the table's oldest snapshot_id is not null and the oldest parent_id is null. The method oldestSnapshot() can current return table's oldest snapshot. After do expired snapshots action, the both oldest snapshot_id and parent_id are not null. In the case, I think we should be return the oldest available snapshot but not null value. Finally, I really appreciate that you can help me double check and run some more extensive local tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
