rdblue commented on pull request #3258:
URL: https://github.com/apache/iceberg/pull/3258#issuecomment-940170086


   @Reo-LEI, the approach in #3103 is similar to what I suggested in this 
comment: https://github.com/apache/iceberg/pull/3258#issuecomment-939144452. 
That approach works and is safe, but it still runs checks that I think are 
unnecessary for the CDC use case. I think it would be better not to run the 
validation at all if I'm right that it is unnecessary.
   
   > If user run deleteOrphanFiles action or delete referenced data file by 
manually/automatically program which is implement by user before flink commit, 
I think this validation can prevent to commit this not exists files.
   
   Deleting orphan files does not affect correctness because the files are not 
referenced.
   
   Removing referenced data files (physically or logically) through any process 
other than `expireSnapshots` is not supported. If you make changes to files 
underneath a table, Iceberg makes no correctness guarantees.
   
   Both of those cases aren't relevant to the problem here. The problem in 
#2482 is that the validation is incorrectly configured and very likely not 
required at all.
   
   > I think the proper way to resolve #2482 is 
MergingSnapshotProducer.validationHistory shop to travel the not exists 
snapshots which are delete by expireSnapshots action
   
   This is partially correct. The validation should stop trying to use 
snapshots that have been expired. But doing that by ignoring expired snapshots 
is not correct. Instead, the validation should be configured to not require the 
old snapshots.
   
   Another way of thinking about this is that the validation is _requesting_ 
all version of the table back to the beginning of table history. You're right 
that it doesn't _need_ all of those versions. But the right way to fix this is 
to stop _requesting_ them rather than breaking the check by ignoring when 
requested versions aren't available. Does that make sense?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to