ForeverAngry commented on PR #2695: URL: https://github.com/apache/iceberg-python/pull/2695#issuecomment-3700128772
> Thanks for the PR! > > I dont think caching `residual_evaluator_of` is a good idea. Its current use explicitly creates a new evaluator instance for each data file for thread safety: > > https://github.com/apache/iceberg-python/blob/b0a787814e409f53725b9c924a18b6d81c647a68/pyiceberg/table/__init__.py#L1900-L1910 > > This is likely because ResidualEvaluator instances are stateful. They store `self.struct` during evaluation > > https://github.com/apache/iceberg-python/blob/b0a787814e409f53725b9c924a18b6d81c647a68/pyiceberg/expressions/visitors.py#L1782-L1784 I spent a good deal of time and tracing the scan hot path, `residual_evaluator_of` invoked once per data file and the evaluator is stateful, just as you suggested, Keven. So caching the evaluator instance is not safe and caching the factory doesn’t really buy performance. Maybe your right @kevinjqliu this PR might not make sense. @Fokko do you agree. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
