pitrou commented on PR #40041:
URL: https://github.com/apache/arrow/pull/40041#issuecomment-1941318931

   Yes, 5000 would be much better here.
   
   Le 13 février 2024 11:45:25 GMT+01:00, Yue ***@***.***> a écrit :
   >@niyue commented on this pull request.
   >
   >
   >
   >> @@ -23,11 +23,7 @@
   > 
   > namespace gandiva {
   > 
   >-#ifdef GANDIVA_ENABLE_OBJECT_CODE_CACHE
   > static const size_t DEFAULT_CACHE_SIZE = 500000;
   >
   >Indeed, the unit in question pertains to the number of entries rather than 
bytes, but there's no misunderstanding of the cache size and the original value 
of `500` was also defined in terms of the number of entries. This PR aims to 
address an oversight by removing a previously missed flag.
   >
   >This link 
[[1]](https://github.com/apache/arrow/pull/11193#issue-1001547667) has more 
data about module cache vs. object code cache in Gandiva, and in the limited 
expressions tested, the memory is down to 0.8%~6% after using object code 
cache. 
   >
   >The current default value 500 is likely too small in production, and it 
will probably only use 8MB of memory using the stats in link [1], but you are 
right it seems 500000 is indeed too large (I have no idea how it was defined 
this way and I should do more calculation for it). According to the stats in 
link [1], when using module cache, it will probably take 750MB memory for 500 
entries.
   >
   >Other initiatives [2] have attempted to enhance the cache eviction 
algorithm, but such attempts were reversed due to other issues [3]. I've 
previously reviewed these efforts and believe I have distinct ideas for 
advancing Gandiva's cache. I plan to propose a PR if I fully understand the 
workflow. To avoid complicating this PR, my goal is solely to refine the 
default value. Would it be acceptable to adopt a more conservative default 
value, such as `5000`/`10000`?
   >
   >[1] https://github.com/apache/arrow/pull/11193#issue-1001547667 
   >[2] https://github.com/apache/arrow/pull/10465
   >[3] https://github.com/apache/arrow/pull/11957
   >
   >-- 
   >Reply to this email directly or view it on GitHub:
   >https://github.com/apache/arrow/pull/40041#discussion_r1487603910
   >You are receiving this because you were mentioned.
   >
   >Message ID: ***@***.***>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to