niyue commented on code in PR #40041:
URL: https://github.com/apache/arrow/pull/40041#discussion_r1487603910
##########
cpp/src/gandiva/cache.cc:
##########
@@ -23,11 +23,7 @@
namespace gandiva {
-#ifdef GANDIVA_ENABLE_OBJECT_CODE_CACHE
static const size_t DEFAULT_CACHE_SIZE = 500000;
Review Comment:
Indeed, the unit in question pertains to the number of entries rather than
bytes, but there's no misunderstanding of the cache size and the original value
of `500` was also defined in terms of the number of entries. This PR aims to
address an oversight by removing a previously missed flag.
This link [[1]](https://github.com/apache/arrow/pull/11193#issue-1001547667)
has more data about module cache vs. object code cache in Gandiva, and in the
limited expressions tested, the memory is down to 0.8%~6% after using object
code cache.
The current default value 500 is likely too small in production, and it will
probably only use 8MB of memory using the stats in link [1], but you are right
it seems 500000 is indeed too large (I have no idea how it was defined this way
and I should do more calculation for it). According to the stats in link [1],
when using module cache, it will probably take 750MB memory for 500 entries.
Other initiatives [2] have attempted to enhance the cache eviction
algorithm, but such attempts were reversed due to other issues [3]. I've
previously reviewed these efforts and believe I have distinct ideas for
advancing Gandiva's cache. I plan to propose a PR if I fully understand the
workflow. To avoid complicating this PR, my goal is solely to refine the
default value. Would it be acceptable to adopt a more conservative default
value, such as `5000`/`10000`? Thanks.
[1] https://github.com/apache/arrow/pull/11193#issue-1001547667
[2] https://github.com/apache/arrow/pull/10465
[3] https://github.com/apache/arrow/pull/11957
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]