Hi Projjal,
The main issue here is to compute the cost accurately (is it computation
runtime? memory footprint? can you measure the computation time
accurately, regardless of system noise - e.g. other threads and processes?).
Intuitively, if the LRU cache shows too many misses, a simple measure is
to increase its size ;-)
Last question: have you considered a second level on-disk cache? Numba
uses such a cache with good results:
https://numba.readthedocs.io/en/stable/developer/caching.html
Regards
Antoine.
Le 20/04/2021 à 06:28, Projjal Chanda a écrit :
Hi,
We currently have a cache[1] in gandiva that caches the built projector or
filter module with LRU based eviction policy. However since the cost of
building different expressions is not uniform it makes sense to have a
different eviction policy that takes into account an associated cost of a cache
miss (while also discounting the items which have not been recently used). We
are planning to use an algorithm called GreedyDual-Size Algorithm [2] which
seems fit for the purpose. The algorithm is quite simple -
Each item has a cost (build time in our case) and item with lowest cost (c_min)
is evicted. All other items cost are deducted by (c_min)
On cache hit, the item cost is restored to the original value
This can be implemented using a priority queue and an efficient implementation
of this can handle both cache hit and eviction in O(logk) time.
Does anybody have any other suggestions or ideas on this?
[1] https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h
<https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h>
[2]
https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html
<https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html>
Regards,
Projjal