pitrou commented on PR #40041:
URL: https://github.com/apache/arrow/pull/40041#issuecomment-1941318931
Yes, 5000 would be much better here.
Le 13 février 2024 11:45:25 GMT+01:00, Yue ***@***.***> a écrit :
>@niyue commented on this pull request.
>
>
>
>> @@ -23,11 +23,7 @@
>
> namespace gandiva {
>
>-#ifdef GANDIVA_ENABLE_OBJECT_CODE_CACHE
> static const size_t DEFAULT_CACHE_SIZE = 500000;
>
>Indeed, the unit in question pertains to the number of entries rather than
bytes, but there's no misunderstanding of the cache size and the original value
of `500` was also defined in terms of the number of entries. This PR aims to
address an oversight by removing a previously missed flag.
>
>This link
[[1]](https://github.com/apache/arrow/pull/11193#issue-1001547667) has more
data about module cache vs. object code cache in Gandiva, and in the limited
expressions tested, the memory is down to 0.8%~6% after using object code
cache.
>
>The current default value 500 is likely too small in production, and it
will probably only use 8MB of memory using the stats in link [1], but you are
right it seems 500000 is indeed too large (I have no idea how it was defined
this way and I should do more calculation for it). According to the stats in
link [1], when using module cache, it will probably take 750MB memory for 500
entries.
>
>Other initiatives [2] have attempted to enhance the cache eviction
algorithm, but such attempts were reversed due to other issues [3]. I've
previously reviewed these efforts and believe I have distinct ideas for
advancing Gandiva's cache. I plan to propose a PR if I fully understand the
workflow. To avoid complicating this PR, my goal is solely to refine the
default value. Would it be acceptable to adopt a more conservative default
value, such as `5000`/`10000`?
>
>[1] https://github.com/apache/arrow/pull/11193#issue-1001547667
>[2] https://github.com/apache/arrow/pull/10465
>[3] https://github.com/apache/arrow/pull/11957
>
>--
>Reply to this email directly or view it on GitHub:
>https://github.com/apache/arrow/pull/40041#discussion_r1487603910
>You are receiving this because you were mentioned.
>
>Message ID: ***@***.***>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]