ForeverAngry commented on code in PR #2695:
URL: https://github.com/apache/iceberg-python/pull/2695#discussion_r2653566292


##########
pyiceberg/expressions/visitors.py:
##########
@@ -1970,11 +1975,37 @@ def residual_for(self, partition_data: Record) -> 
BooleanExpression:
         return self.expr
 
 
-def residual_evaluator_of(
+_DEFAULT_RESIDUAL_EVALUATOR_CACHE_SIZE = 128

Review Comment:
   I probably should have put this in my initial PR reasoning; my _guiding 
star_ here was that this feature should lean toward performance safety rather 
than super-tight memory tuning. 
   
   Residual evaluators are on the hot path for pruning, so if we miss the cache 
we end up rebinding expressions — and that’s relatively expensive in Python. A 
cache size of 128 lines up with common LRU defaults, and in practice it helps 
cut down query time and unnecessary I/O. 
   
   In my own experience, PyIceberg usually runs on instances with plenty of RAM 
(multiple GBs), so using a bit more memory to get more predictable performance 
feels like a good trade-off. I’ll also acknowledge that this comes from my 
experience, so there may be some bias there — but I think it’s a reasonable 
default for most real-world workloads.  I'm happy to adjust if you feel 
strongly, but maybe we go with 64? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to