Dam1029 edited a comment on pull request #3358:
URL: https://github.com/apache/iceberg/pull/3358#issuecomment-950256939


   > Thanks for this contribution @Dam1029!
   > 
   > As the function you’ve changed in SnapshotUtil is used in other places, we 
should be sure that it’s being tested for more than just the scenario that your 
patch covers (though greatly appreciate you adding these tests!).
   > 
   > I want to draw attention to the fact that a `null` snapshotId is a valid 
value, so we might want to handle this case elsewhere in the code. As the 
oldest ancestor could potentially be null (the start of the table).
   > 
   > Also, in the case of expired snapshots, we should think about what exactly 
the intended behavior should be. For example, if a user has `failOnDataLoss` as 
true, then some sort of exception might be expected. Though I definitely agree 
that an NPE is not a good exception in any case, and we should handle that more 
explicitly (even if we throw an exception still).
   > 
   > One other way I think an NPE or an indefinite read-loop could happen would 
be if somebody does a CreateOrReplace operation (replace table). As that also 
generates a `null` initial snapshot.
   > 
   > I can try to add some unit tests around that if you don’t have the time. I 
think your patch potentially might fix that too (but would need to check).
   > 
   > I’ll do my best to check out this patch and run some more extensive local 
tests over it too. It might be sufficient to return null (which is a valid 
snapshot id) and then reset to the initial offset - though I’m just thinking 
out loud here as I need to check this out and play around.
   > 
   > Again, thank you for your contribution!
   
   Thank you for you reply. I agree with you that null snapshotId is a valid 
value. The method oldestSnapshot() is used to return oldest and available 
snapshot. It only return null if there is no current snapshot in the table, 
such as empty table.
   
   We suppose iceberg table have not executed expired snapshot action. Then the 
table's oldest snapshot_id is not null and the oldest parent_id is null. The 
method oldestSnapshot() can current return table's oldest snapshot. 
   
   After do expired snapshots action, the both oldest snapshot_id and parent_id 
are not null. In the case, I think we should be return the oldest available 
snapshot but not null value.
   
   Finally, I really appreciate that you can help me double  check and run some 
more extensive local tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to