Ivan, thanks for the detailed explanation. My preference would be to support all of the suggested eviction policies and control them from the configuration. However, it does sound the K-random-LRU or K-randome-LRU-2 will be much easier to support than LIRs.
Don't we already have something like K-random-LRU in place? D. On Tue, Feb 14, 2017 at 9:13 AM, Ivan Rakov <ivan.glu...@gmail.com> wrote: > Hello Igniters, > > I want to share some thoughts about supporting eviction on new page memory > storage. > > I think, we should reconsider eviction policies in their current state. > Current pluggable eviction policy mechanism allows to determine which key > will be evicted from the cache and when eviction should be performed. There > are two issues about it: > > 1) Such mechanism tends to make off-heap storage highly fragmented: batch > of small evicted entries can accumulate needed total amount of free space > in FreeList, but it still would be impossible to store one big entry. > > 2) Growth of cache size implies creating O(n) heap objects and links to > maintain eviction policy, which causes linear growth of GC load. That > doesn't suit us: getting rid of GC overhead was one of the main reasons for > keeping cache data in off-heap memory. > > I propose to use "whole page eviction, off-heap eviction metadata" as > default mode for eviction. It's almost non-deterministic, because actual > eviction result depends on how data is packed into pages, but is more > efficient, and shouldn't cause long-term performance degradation due to > fragmentation. > > I can suggest two ways to implement it: > > 1) *K-random-LRU > *The idea is to allocate big off-heap array, to track timestamp of last > usage for each data page. > When data page on address X is accessed, we store current timestamp in X / > PAGE_SIZE array position. > When there's a need for eviction, we randomly choose K indices of array > (if some of indices point to non-data pages, re-choose them) and evict data > page with minimum timestamp. In case K=5, we'll evict page from 17% least > recently used set with 50% probability <http://math.stackexchange.com > /questions/786392/expectation-of-minimum-of-n-i-i-d-uniform- > random-variables>. > Non-data pages can be simply recognized by having zero in corresponding > position of timestamp array. > If entry is too big and occupies more than one page, only first page will > be tracked, and other will be considered as non-data pages. > *K-random-LRU-2* is perspective variant of this approach. We'll evict page > with minimum time of penultimate access. Implementation is mostly the same, > the difference is that we should store two timestamps for each data page. > LRU-2 outperforms <http://www.cs.cmu.edu/%7Echri > stos/courses/721-resources/p297-o_neil.pdf> LRU by resolving "one-hit > wonder" problem: if page is accessed very rarely, but accidentally accessed > once, it's "protected" from eviction for long time. > Memory overhead: timestamp can be packed into 4 bytes. 100GB off-heap > cache with standard 2KB pages requires 200MB of additional memory (400MB in > case of LRU-2 strategy).* > * > > 2)*Adaptation of CLOCK-Pro > *CLOCK-Pro (article <https://www.usenix.org/legacy > /event/usenix05/tech/general/full_papers/jiang/jiang_html/html.html>, > presentation <http://www.slideshare.net/huliang64/clockpro>) is modern > page replacement algorithm, used in actual OS kernels. It is an > approximation of LIRS <http://web.cse.ohio-state.edu > /hpcs/WWW/HTML/publications/papers/TR-02-6.pdf> with significant > advantage for our case: instead of doubly linked list, which is hard to > maintain in off-heap memory, CLOCK-Pro uses only circular list and three > pointers. However, we can't use CLOCK-Pro as is: there's difference between > OS paged memory and Ignite off-heap storage. In OS, if memory page is > swapped out, it still contains the same data, with same relevance and > patterns of access. That's why keeping metadata for swapped-out page (see > Non-resident Cold Pages set) works. In Ignite, in case we evict page, all > data is discarded; entries with same keys may appear in the cache again, > but they will be evenly distributed across the whole storage. > I propose to use simplified version of CLOCK-Pro with only two pointers > and two page sets: hot and cold. Non-resident pages won't be tracked. > To implement it, we should allocate big off-heap array to track flags for > each page. We should store (hot, cold) bit, (accessed, not accessed) bit > and one bit indicating that the page is data page, totally 3 bits per page. > 100GB off-heap cache with standard 2KB pages requires ~20MB of additional > memory, or 50MB in case we don't want to shrink metadata for multiple pages > in one byte. > Memory overhead is really small, but we can't predict average and maximum > time for one eviction, it has to be discovered during an experiment (in > hypothetical worst case, clock hand can travel through the whole array). > Also, hot/cold ratio can affect performance dramatically and should be > tuned well. > > Note about classic "key" eviction: fair per-entry policies (FIFO, LRU, > Sorted) may be useful for some users, and we may want to support them as > well. It's no problem at all to implement them on-heap (more than, current > implementations of EvictionPolicy will work in if we make minor changes in > the code). In off-heap, FIFO is easy to implement (circular buffer), but > LRU (and more effective LIRS, if someone wants it) require doubly linked > lists. The only option I see is using our BPlusTree with some kind of > implicit key, but this implies O(log n) overhead on each entry read access. > > Another note: current GridCacheEvictionManager have eviction backup > synchronization mechanism. Do we want to support it for off-heap storage as > well? > > Please let me know if my understanding of any algorithm or part of the > system is wrong. > > -- > Best Regards, > Ivan Rakov > >