IGNITE-4534 - Ignite 2.0 eviction design

Ivan Rakov Tue, 14 Feb 2017 09:13:45 -0800

Hello Igniters,

I want to share some thoughts about supporting eviction on new pagememory storage.

I think, we should reconsider eviction policies in their current state.Current pluggable eviction policy mechanism allows to determine whichkey will be evicted from the cache and when eviction should beperformed. There are two issues about it:

1) Such mechanism tends to make off-heap storage highly fragmented:batch of small evicted entries can accumulate needed total amount offree space in FreeList, but it still would be impossible to store onebig entry.

2) Growth of cache size implies creating O(n) heap objects and links tomaintain eviction policy, which causes linear growth of GC load. Thatdoesn't suit us: getting rid of GC overhead was one of the main reasonsfor keeping cache data in off-heap memory.

I propose to use "whole page eviction, off-heap eviction metadata" asdefault mode for eviction. It's almost non-deterministic, because actualeviction result depends on how data is packed into pages, but is moreefficient, and shouldn't cause long-term performance degradation due tofragmentation.


I can suggest two ways to implement it:

1) *K-random-LRU

*The idea is to allocate big off-heap array, to track timestamp of lastusage for each data page.When data page on address X is accessed, we store current timestamp in X/ PAGE_SIZE array position.When there's a need for eviction, we randomly choose K indices of array(if some of indices point to non-data pages, re-choose them) and evictdata page with minimum timestamp. In case K=5, we'll evict page from 17%least recently used set with 50% probability<http://math.stackexchange.com/questions/786392/expectation-of-minimum-of-n-i-i-d-uniform-random-variables>.Non-data pages can be simply recognized by having zero in correspondingposition of timestamp array.If entry is too big and occupies more than one page, only first pagewill be tracked, and other will be considered as non-data pages.*K-random-LRU-2* is perspective variant of this approach. We'll evictpage with minimum time of penultimate access. Implementation is mostlythe same, the difference is that we should store two timestamps for eachdata page. LRU-2 outperforms<http://www.cs.cmu.edu/%7Echristos/courses/721-resources/p297-o_neil.pdf>LRU by resolving "one-hit wonder" problem: if page is accessed veryrarely, but accidentally accessed once, it's "protected" from evictionfor long time.Memory overhead: timestamp can be packed into 4 bytes. 100GB off-heapcache with standard 2KB pages requires 200MB of additional memory (400MBin case of LRU-2 strategy).*

*

2)*Adaptation of CLOCK-Pro

*CLOCK-Pro (article<https://www.usenix.org/legacy/event/usenix05/tech/general/full_papers/jiang/jiang_html/html.html>,presentation <http://www.slideshare.net/huliang64/clockpro>) is modernpage replacement algorithm, used in actual OS kernels. It is anapproximation of LIRS<http://web.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-02-6.pdf>with significant advantage for our case: instead of doubly linked list,which is hard to maintain in off-heap memory, CLOCK-Pro uses onlycircular list and three pointers. However, we can't use CLOCK-Pro as is:there's difference between OS paged memory and Ignite off-heap storage.In OS, if memory page is swapped out, it still contains the same data,with same relevance and patterns of access. That's why keeping metadatafor swapped-out page (see Non-resident Cold Pages set) works. In Ignite,in case we evict page, all data is discarded; entries with same keys mayappear in the cache again, but they will be evenly distributed acrossthe whole storage.I propose to use simplified version of CLOCK-Pro with only two pointersand two page sets: hot and cold. Non-resident pages won't be tracked.To implement it, we should allocate big off-heap array to track flagsfor each page. We should store (hot, cold) bit, (accessed, not accessed)bit and one bit indicating that the page is data page, totally 3 bitsper page. 100GB off-heap cache with standard 2KB pages requires ~20MB ofadditional memory, or 50MB in case we don't want to shrink metadata formultiple pages in one byte.Memory overhead is really small, but we can't predict average andmaximum time for one eviction, it has to be discovered during anexperiment (in hypothetical worst case, clock hand can travel throughthe whole array). Also, hot/cold ratio can affect performancedramatically and should be tuned well.

Note about classic "key" eviction: fair per-entry policies (FIFO, LRU,Sorted) may be useful for some users, and we may want to support them aswell. It's no problem at all to implement them on-heap (more than,current implementations of EvictionPolicy will work in if we make minorchanges in the code). In off-heap, FIFO is easy to implement (circularbuffer), but LRU (and more effective LIRS, if someone wants it) requiredoubly linked lists. The only option I see is using our BPlusTree withsome kind of implicit key, but this implies O(log n) overhead on eachentry read access.

Another note: current GridCacheEvictionManager have eviction backupsynchronization mechanism. Do we want to support it for off-heap storageas well?

Please let me know if my understanding of any algorithm or part of thesystem is wrong.


--
Best Regards,
Ivan Rakov

IGNITE-4534 - Ignite 2.0 eviction design

Reply via email to