rdblue commented on pull request #3258: URL: https://github.com/apache/iceberg/pull/3258#issuecomment-940170086
@Reo-LEI, the approach in #3103 is similar to what I suggested in this comment: https://github.com/apache/iceberg/pull/3258#issuecomment-939144452. That approach works and is safe, but it still runs checks that I think are unnecessary for the CDC use case. I think it would be better not to run the validation at all if I'm right that it is unnecessary. > If user run deleteOrphanFiles action or delete referenced data file by manually/automatically program which is implement by user before flink commit, I think this validation can prevent to commit this not exists files. Deleting orphan files does not affect correctness because the files are not referenced. Removing referenced data files (physically or logically) through any process other than `expireSnapshots` is not supported. If you make changes to files underneath a table, Iceberg makes no correctness guarantees. Both of those cases aren't relevant to the problem here. The problem in #2482 is that the validation is incorrectly configured and very likely not required at all. > I think the proper way to resolve #2482 is MergingSnapshotProducer.validationHistory shop to travel the not exists snapshots which are delete by expireSnapshots action This is partially correct. The validation should stop trying to use snapshots that have been expired. But doing that by ignoring expired snapshots is not correct. Instead, the validation should be configured to not require the old snapshots. Another way of thinking about this is that the validation is _requesting_ all version of the table back to the beginning of table history. You're right that it doesn't _need_ all of those versions. But the right way to fix this is to stop _requesting_ them rather than breaking the check by ignoring when requested versions aren't available. Does that make sense? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
