vincentpoon commented on issue #5936: URL: https://github.com/apache/iceberg/issues/5936#issuecomment-1278209330
@rdblue Hmm I guess it depends on what "correct" behavior means here, but if the partition stats reflect values that can never be returned in a query (because the files containing those values have been deleted), then that seems incorrect to me. And changing the behavior would be a perf improvement, particularly when the manifests are quite large, as they are in our use case. Filtering using the partition stats at the manifest list level means certain manifests don't have to be read. With incorrect partition stats, the manifests are read even when they don't have any files that can answer the query. Agree that a mode to simply drop files rather than keep references would solve the problem. But then I would ask, what's the functionality of keeping around deleted files in the manifests with "Status: 2" (deleted) ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
