Hi Projjal,

The main issue here is to compute the cost accurately (is it computation runtime? memory footprint? can you measure the computation time accurately, regardless of system noise - e.g. other threads and processes?).

Intuitively, if the LRU cache shows too many misses, a simple measure is to increase its size ;-)

Last question: have you considered a second level on-disk cache? Numba uses such a cache with good results:
https://numba.readthedocs.io/en/stable/developer/caching.html

Regards

Antoine.


Le 20/04/2021 à 06:28, Projjal Chanda a écrit :
Hi,
We currently have a cache[1] in gandiva that caches the built projector or 
filter module with LRU based eviction policy. However since the cost of 
building different expressions is not uniform it makes sense to have a 
different eviction policy that takes into account an associated cost of a cache 
miss (while also discounting the items which have not been recently used). We 
are planning to use an algorithm called GreedyDual-Size Algorithm [2] which 
seems fit for the purpose. The algorithm is quite simple -
Each item has a cost (build time in our case) and item with lowest cost (c_min) 
is evicted. All other items cost are deducted by (c_min)
On cache hit, the item cost is restored to the original value

This can be implemented using a priority queue and an efficient implementation 
of this can handle both cache hit and eviction in O(logk) time.

Does anybody have any other suggestions or ideas on this?

[1] https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h 
<https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h>
[2] 
https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html
 
<https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html>

Regards,
Projjal

Reply via email to