Re: IGNITE-4534 - Ignite 2.0 eviction design

Alexey Goncharuk Wed, 15 Feb 2017 09:07:08 -0800

Dmitriy,

For the current off-heap approach, we only have a per-entry LRU eviction
policy which does not fit well page memory architecture. I like the
adaptation of clock pro algorithm because it requires a significantly
smaller amount of memory to track page updates and is scan-resistant.


I want to back Ivan here and discuss whether we need synchronized evictions
at all. Given that current implementation has flaws and correct
implementation should work with the same protocol guarantees as a cache
update (meaning two-phase commit for transactional caches on every
operation, including reads). I would rather drop synchronous evictions at
all.

Thoughts?

2017-02-15 0:59 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:

> Ivan, thanks for the detailed explanation. My preference would be to
> support all of the suggested eviction policies and control them from the
> configuration. However, it does sound the K-random-LRU or K-randome-LRU-2
> will be much easier to support than LIRs.
>
> Don't we already have something like K-random-LRU in place?
>
> D.
>
> On Tue, Feb 14, 2017 at 9:13 AM, Ivan Rakov <ivan.glu...@gmail.com> wrote:
>
> > Hello Igniters,
> >
> > I want to share some thoughts about supporting eviction on new page
> memory
> > storage.
> >
> > I think, we should reconsider eviction policies in their current state.
> > Current pluggable eviction policy mechanism allows to determine which key
> > will be evicted from the cache and when eviction should be performed.
> There
> > are two issues about it:
> >
> > 1) Such mechanism tends to make off-heap storage highly fragmented: batch
> > of small evicted entries can accumulate needed total amount of free space
> > in FreeList, but it still would be impossible to store one big entry.
> >
> > 2) Growth of cache size implies creating O(n) heap objects and links to
> > maintain eviction policy, which causes linear growth of GC load. That
> > doesn't suit us: getting rid of GC overhead was one of the main reasons
> for
> > keeping cache data in off-heap memory.
> >
> > I propose to use "whole page eviction, off-heap eviction metadata" as
> > default mode for eviction. It's almost non-deterministic, because actual
> > eviction result depends on how data is packed into pages, but is more
> > efficient, and shouldn't cause long-term performance degradation due to
> > fragmentation.
> >
> > I can suggest two ways to implement it:
> >
> > 1) *K-random-LRU
> > *The idea is to allocate big off-heap array, to track timestamp of last
> > usage for each data page.
> > When data page on address X is accessed, we store current timestamp in X
> /
> > PAGE_SIZE array position.
> > When there's a need for eviction, we randomly choose K indices of array
> > (if some of indices point to non-data pages, re-choose them) and evict
> data
> > page with minimum timestamp. In case K=5, we'll evict page from 17% least
> > recently used set with 50% probability <http://math.stackexchange.com
> > /questions/786392/expectation-of-minimum-of-n-i-i-d-uniform-
> > random-variables>.
> > Non-data pages can be simply recognized by having zero in corresponding
> > position of timestamp array.
> > If entry is too big and occupies more than one page, only first page will
> > be tracked, and other will be considered as non-data pages.
> > *K-random-LRU-2* is perspective variant of this approach. We'll evict
> page
> > with minimum time of penultimate access. Implementation is mostly the
> same,
> > the difference is that we should store two timestamps for each data page.
> > LRU-2 outperforms <http://www.cs.cmu.edu/%7Echri
> > stos/courses/721-resources/p297-o_neil.pdf> LRU by resolving "one-hit
> > wonder" problem: if page is accessed very rarely, but accidentally
> accessed
> > once, it's "protected" from eviction for long time.
> > Memory overhead: timestamp can be packed into 4 bytes. 100GB off-heap
> > cache with standard 2KB pages requires 200MB of additional memory (400MB
> in
> > case of LRU-2 strategy).*
> > *
> >
> > 2)*Adaptation of CLOCK-Pro
> > *CLOCK-Pro (article <https://www.usenix.org/legacy
> > /event/usenix05/tech/general/full_papers/jiang/jiang_html/html.html>,
> > presentation <http://www.slideshare.net/huliang64/clockpro>) is modern
> > page replacement algorithm, used in actual OS kernels. It is an
> > approximation of LIRS <http://web.cse.ohio-state.edu
> > /hpcs/WWW/HTML/publications/papers/TR-02-6.pdf> with significant
> > advantage for our case: instead of doubly linked list, which is hard to
> > maintain in off-heap memory, CLOCK-Pro uses only circular list and three
> > pointers. However, we can't use CLOCK-Pro as is: there's difference
> between
> > OS paged memory and Ignite off-heap storage. In OS, if memory page is
> > swapped out, it still contains the same data, with same relevance and
> > patterns of access. That's why keeping metadata for swapped-out page (see
> > Non-resident Cold Pages set) works. In Ignite, in case we evict page, all
> > data is discarded; entries with same keys may appear in the cache again,
> > but they will be evenly distributed across the whole storage.
> > I propose to use simplified version of CLOCK-Pro with only two pointers
> > and two page sets: hot and cold. Non-resident pages won't be tracked.
> > To implement it, we should allocate big off-heap array to track flags for
> > each page. We should store (hot, cold) bit, (accessed, not accessed) bit
> > and one bit indicating that the page is data page, totally 3 bits per
> page.
> > 100GB off-heap cache with standard 2KB pages requires ~20MB of additional
> > memory, or 50MB in case we don't want to shrink metadata for multiple
> pages
> > in one byte.
> > Memory overhead is really small, but we can't predict average and maximum
> > time for one eviction, it has to be discovered during an experiment (in
> > hypothetical worst case, clock hand can travel through the whole array).
> > Also, hot/cold ratio can affect performance dramatically and should be
> > tuned well.
> >
> > Note about classic "key" eviction: fair per-entry policies (FIFO, LRU,
> > Sorted) may be useful for some users, and we may want to support them as
> > well. It's no problem at all to implement them on-heap (more than,
> current
> > implementations of EvictionPolicy will work in if we make minor changes
> in
> > the code). In off-heap, FIFO is easy to implement (circular buffer), but
> > LRU (and more effective LIRS, if someone wants it) require doubly linked
> > lists. The only option I see is using our BPlusTree with some kind of
> > implicit key, but this implies O(log n) overhead on each entry read
> access.
> >
> > Another note: current GridCacheEvictionManager have eviction backup
> > synchronization mechanism. Do we want to support it for off-heap storage
> as
> > well?
> >
> > Please let me know if my understanding of any algorithm or part of the
> > system is wrong.
> >
> > --
> > Best Regards,
> > Ivan Rakov
> >
> >
>

Re: IGNITE-4534 - Ignite 2.0 eviction design

Reply via email to