ForeverAngry commented on PR #2695:
URL: https://github.com/apache/iceberg-python/pull/2695#issuecomment-3700128772

   > Thanks for the PR!
   > 
   > I dont think caching `residual_evaluator_of` is a good idea. Its current 
use explicitly creates a new evaluator instance for each data file for thread 
safety:
   > 
   > 
https://github.com/apache/iceberg-python/blob/b0a787814e409f53725b9c924a18b6d81c647a68/pyiceberg/table/__init__.py#L1900-L1910
   > 
   > This is likely because ResidualEvaluator instances are stateful. They 
store `self.struct` during evaluation
   > 
   > 
https://github.com/apache/iceberg-python/blob/b0a787814e409f53725b9c924a18b6d81c647a68/pyiceberg/expressions/visitors.py#L1782-L1784
   
    I spent a good deal of time and tracing the scan hot path, 
`residual_evaluator_of` invoked once per data file and the evaluator is 
stateful, just as you suggested, Keven. So caching the evaluator instance is 
not safe and caching the factory doesn’t really buy performance. Maybe your 
right @kevinjqliu this PR might not make sense.  @Fokko do you agree. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to