Mirza, sure. Will wait for your review.

пн, 25 янв. 2021 г. в 11:06, Mirza Aliev <alievmi...@gmail.com>:

> Hello, Alexey!
>
> Thank you for your effort, the feature looks great!
>
> Could you please give a bit more time to review it?
>
> Best regards,
> Mirza Aliev
>
> пт, 22 янв. 2021 г. в 10:19, Alex Plehanov <plehanov.a...@gmail.com>:
>
> > Hello, Igniters!
> >
> > Zhenya has already approved the patch that adds segmented-LRU and CLOCK
> > page replacement modes [1] (thanks for the review!). Would anyone else
> like
> > to review it?
> > I have plans to merge it next Monday if nobody else will be interested in
> > reviewing it.
> >
> > [1]: https://issues.apache.org/jira/browse/IGNITE-13761
> >
> > пт, 25 дек. 2020 г. в 12:26, Alex Plehanov <plehanov.a...@gmail.com>:
> >
> > > Guys,
> > >
> > > I've implemented Segmented-LRU page replacement algorithm and
> benchmarked
> > > results, it gives some boost (5-10%) when page replacement is
> > > heavily used, but, unfortunately, when replacement is not used it gives
> > > about 2% drop compared to the current Random-LRU page replacement
> > > implementation. Due to this, I think Segmented-LRU can't be used as the
> > > only option or option by default.
> > >
> > > Also, I've implemented CLOCK page replacement algorithm (basic,
> > > not scan-resistant version) and benchmark results are much better. It
> > gives
> > > about the same performance as Segmented-LRU when page replacement is
> used
> > > and about the same performance as Random-LRU where there is no page
> > > replacement. There are advanced modifications of CLOCK algorithm
> exists,
> > > but usually, they require additional maintenance cost and we can again
> > get
> > > drop on environments without page replacements compared to Random-LRU.
> > I've
> > > written a benchmark with background full cache scans and even in this
> > case
> > > basic CLOCK version looks promising.
> > >
> > > So, my proposals:
> > > 1. Include all 3 implementations (Random-LRU, Segmented-LRU, CLOCK) to
> > the
> > > product.
> > > 2. Make page replacement algorithm configurable.
> > > 3. Recommend to use Random-LRU for environments with no page
> replacements
> > > (as it has zero maintenance cost). Recommend to use Segmented-LRU for
> > > environments with a high page replacement rate and a large amount of
> > > one-time scans (as it has near to optimal page to replace selection
> > policy
> > > and scan-resistant). Recommend to use CLOCK for all other cases (as it
> > has
> > > near to zero maintenance cost and efficiency of page replacement near
> to
> > > Segmented-LRU).
> > > 4. Set CLOCK as the default page replacement algorithm.
> > >
> > > WDYT?
> > >
> > > I've filled the IEP [1] for this discussion and create the pull request
> > > [2] for the last proposal. I would appreciate for review of the patch.
> > >
> > > [1]:
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-62+Page+replacement+improvements
> > > [2]: https://github.com/apache/ignite/pull/8513
> > >
> > > пн, 23 нояб. 2020 г. в 11:12, Zhenya Stanilovsky
> > > <arzamas...@mail.ru.invalid>:
> > >
> > >>
> > >>
> > >> Nikolay, i hope such case ignite users already observed)
> > >> I suggest to: put data bigger then available, full scan, get data only
> > >> for available inmem data in loop, check PageReplacement metric, how
> > match
> > >> iterations will bring to zero this metric.
> > >>
> > >> >Hello, Alex.
> > >> >
> > >> >> Perhaps we can implement a special benchmark for this case with
> > >> manually triggered "batch page replacement" using yardstick (but I'm
> not
> > >> sure if ducktape can help here).
> > >> >
> > >> >I think we should carefully describe the issues with the current
> > >> approach and then we can choose right tool to benchmark improvements.
> > >> >You said:
> > >> >
> > >> >> we use Random-LRU algorithm … it has many disadvantages and affects
> > >> performance very much when replacement is started
> > >> >
> > >> >Can you list disadvantages of the Random-LRU?
> > >> >
> > >> >AFAIU the first benchmark should be:
> > >> >
> > >> >1. Start cluster with persistence and put data bigger then available
> > RAM
> > >> to it.
> > >> >2. Measure performance of the queries that selects one entry from the
> > >> cache.
> > >> >3. Make some queries that will iterate throw all data - `SELECT
> SUM(x)
> > >> FROM t` or something similar.
> > >> >4. Repeat step 2. in this time performance of random queries should
> be
> > >> lower due to the page replacement.
> > >> >
> > >> >Is this scenario correct?
> > >> >
> > >> >> 23 нояб. 2020 г., в 09:12, Alex Plehanov < plehanov.a...@gmail.com
> >
> > >> написал(а):
> > >> >>
> > >> >> Nikolay, Zhenya,
> > >> >>
> > >> >> Benchmark from IGNITE-13034 is fully synthetic, it makes random
> puts
> > >> >> uniformly. It can be used to compare different page replacement
> > >> algorithms
> > >> >> (random-LRU vs segmented-LRU), but it is totally inapplicable to
> > >> benchmark
> > >> >> batch page replacement.
> > >> >> Perhaps we can implement a special benchmark for this case with
> > >> manually
> > >> >> triggered "batch page replacement" using yardstick (but I'm not
> sure
> > >> >> if ducktape can help here).
> > >> >>
> > >> >>> I understand case you described, but who will pull the switch ?
> > Human,
> > >> >> artificial intelligence ?
> > >> >> As I described before, we can implement both manual and automatic
> > >> "batch
> > >> >> page replacement" triggering. For automatic triggering, there is no
> > >> >> artificial intelligence needed, just several conditions with
> > reasonable
> > >> >> thresholds. Automatic triggering also can be disabled by default.
> > >> >>
> > >> >> пт, 20 нояб. 2020 г. в 13:32, Zhenya Stanilovsky <
> > >> arzamas...@mail.ru.invalid
> > >> >>> :
> > >> >>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>> Zhenya,
> > >> >>>>
> > >> >>>>> Alexey, we already have changes that partially fixes this issue
> > [1]
> > >> >>>> IGNITE-13086 it's a minor improvement. We still have major
> problems
> > >> with
> > >> >>>> our page replacement algorithm (slow page selection and
> non-optimal
> > >> >>>> page-fault rate). I think changing from random 5 pages to 7 will
> > make
> > >> >>>> things even worse (it's better for page-fault rate, but page
> > >> selection
> > >> >>> will
> > >> >>>> be slower).
> > >> >>> All this words above need to be proven, i hope. + 1 with Nikolay,
> we
> > >> need
> > >> >>> correct reproduces or some graphs from 2.9 ver.
> > >> >>>
> > >> >>>>
> > >> >>>>> This approach still not applicable for real life
> > >> >>>> Why do you think batch replacement is not applicable for
> real-life?
> > >> It can
> > >> >>>> be applied for workloads, where some big amount of data
> > periodically
> > >> used,
> > >> >>>> but not very often. For example, when OLAP request over
> historical
> > >> data
> > >> >>>> raised pages to page-memory, and after such request this data is
> > not
> > >> >>> needed
> > >> >>>> for a long time. Or when OLTP transactions mostly add new data
> and
> > >> process
> > >> >>>> recent data but rarely touch historical data. In these cases with
> > the
> > >> >>>> current approach, we will enter "page replacement mode" after
> some
> > >> period
> > >> >>>> of time and never leave it. With batch page replacement there is
> a
> > >> chance
> > >> >>>> to prevent random-LRU page replacement or postpone it.
> > >> >>> I understand case you described, but who will pull the switch ?
> > Human,
> > >> >>> artificial intelligence ?
> > >> >>> You approach assume some triggering from inner, i don`t like this.
> > >> >>>
> > >> >>>>
> > >> >>>>> But request once more, do you really observe such problems with
> > 2.9
> > >> ver
> > >> >>> ?
> > >> >>>> Any graphs maybe ?
> > >> >>>> I don't have production usage feedback after IGNITE-13086, but I
> > >> doubt
> > >> >>>> something changed significantly.
> > >> >>>
> > >> >>> Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes
> > >> yardstik
> > >> >>> bench for PR proven, we can use it once more.
> > >> >>>
> > >> >>> Thanks !
> > >> >>>>
> > >> >>>>
> > >> >>>> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <
> > >> >>>  arzamas...@mail.ru.invalid
> > >> >>>>> :
> > >> >>>>
> > >> >>>>>
> > >> >>>>> Alexey, we already have changes that partially fixes this issue
> > [1]
> > >> >>>>> Easy way:
> > >> >>>>> Looks like we already have converge in page replacement.
> > >> >>>>> If we change 5 times touch iterator from random lru algo into,
> for
> > >> >>>>> example — 7 we will obtain fast improvement from scratch.
> > >> >>>>>
> > >> >>>>> » Batch page replacement
> > >> >>>>> This approach still not applicable for real life if you wan`t to
> > >> observe
> > >> >>>>> ugly people for threshold (i.e. 12 h) interval. And, of course,
> > you
> > >> >>>>> understand that dramatically reduce of such interval gives
> > nothing?
> > >> >>>>>
> > >> >>>>> » Change the page replacement algorithm.
> > >> >>>>> That`s way i vote for ) But request once more, do you really
> > observe
> > >> >>> such
> > >> >>>>> problems with 2.9 ver ? Any graphs maybe ?
> > >> >>>>>
> > >> >>>>> thanks !
> > >> >>>>>
> > >> >>>>> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
> > >> >>>>>> Hello, Igniters!
> > >> >>>>>>
> > >> >>>>>> Currently, for page replacement (page rotation between
> > page-memory
> > >> and
> > >> >>>>>> disk) we use Random-LRU algorithm. It has a low maintenance
> cost
> > >> and
> > >> >>>>>> relatively simple implementation, but it has many disadvantages
> > and
> > >> >>>>> affects
> > >> >>>>>> performance very much when replacement is started. We even have
> > >> >>> warnings
> > >> >>>>> in
> > >> >>>>>> the log when page replacement started and a special event for
> > >> this. I
> > >> >>> know
> > >> >>>>>> Ignite deployments where administrators force to restart
> cluster
> > >> nodes
> > >> >>>>>> periodically to avoid page replacement.
> > >> >>>>>>
> > >> >>>>>> I have a couple of proposals to improve page replacement in
> > Ignite:
> > >> >>>>>>
> > >> >>>>>> *Batch page replacement.*
> > >> >>>>>>
> > >> >>>>>> Main idea: in some cases start background task to evict cold
> > pages
> > >> from
> > >> >>>>>> page-memory (for example, pages, last touched more than 12
> hours
> > >> ago).
> > >> >>>>>>
> > >> >>>>>> The task can be started:
> > >> >>>>>> - Automatically, triggered by some events, for example, when we
> > >> expect
> > >> >>> a
> > >> >>>>>> start of Random-LRU page replacing soon (allocated more than
> 90%
> > of
> > >> >>>>>> page-memory) + we have enough amount of cold pages (we need
> some
> > >> >>> metric to
> > >> >>>>>> calculate the number of cold pages) + some time passed since
> last
> > >> batch
> > >> >>>>>> page replacement (to avoid too much resource consumption by
> > >> background
> > >> >>>>>> batch replacement).
> > >> >>>>>> - Manually (JMX or control.sh), if an administrator wants to
> > >> control
> > >> >>> the
> > >> >>>>>> time of batch replacement more precisely (for example, to avoid
> > the
> > >> >>> start
> > >> >>>>>> of this task during peak time).
> > >> >>>>>>
> > >> >>>>>> Batch page replacement will be helpful in some workloads (when
> > some
> > >> >>> data
> > >> >>>>>> much colder than another), it can prevent the starting of
> > >> Random-LRU
> > >> >>> page
> > >> >>>>>> replacement, or if Random-LRU already started it can provide
> > >> >>> conditions to
> > >> >>>>>> stop it.
> > >> >>>>>>
> > >> >>>>>> *Change the page replacement algorithm.*
> > >> >>>>>>
> > >> >>>>>> Good page replacement algorithm should satisfy the
> requirements:
> > >> >>>>>> - low page-fault rates for typical workload
> > >> >>>>>> - low maintenance cost (low resource consumption to maintain
> > >> additional
> > >> >>>>>> structures required for page replacement)
> > >> >>>>>> - fast searching of next page for replacement
> > >> >>>>>> - sequential scans resistance (one sequential scan should not
> > >> evict all
> > >> >>>>>> relatively hot pages from page-memory)
> > >> >>>>>>
> > >> >>>>>> Our Random-LRU has low maintenance cost and sequential scan
> > >> resistant,
> > >> >>> but
> > >> >>>>>> to find the next page for replacement in the best case we scan
> 5
> > >> >>> pages, in
> > >> >>>>>> the worst case we can scan all data region segment. Also, due
> to
> > >> random
> > >> >>>>>> nature, it's not very effective in predicting the right page
> for
> > >> >>>>>> replacement to minimize the page-fault rate. And it's much time
> > >> >>> required
> > >> >>>>> to
> > >> >>>>>> totally evict old cold data.
> > >> >>>>>>
> > >> >>>>>> Usually, database management systems and operating systems use
> > >> >>>>>> modifications of LRU algorithms. These algorithms have higher
> > >> >>> maintenance
> > >> >>>>>> costs (pages list should be modified on each page access), but
> > >> often
> > >> >>> they
> > >> >>>>>> are effective from a "page-fault rate" point of view and have
> > O(1)
> > >> >>>>>> complexity for a searching page to replace. Simple LRU is not
> > >> >>> sequential
> > >> >>>>>> scan resistant, but modifications that utilize page access
> > >> frequency
> > >> >>> are
> > >> >>>>>> resistant to sequential scan.
> > >> >>>>>>
> > >> >>>>>> We can try one of the modifications of LRU as well (for
> example,
> > >> >>>>> "segmented
> > >> >>>>>> LRU" seems suitable for Ignite).
> > >> >>>>>>
> > >> >>>>>> Ignite is a memory-centric product, so "low maintenance cost"
> is
> > >> very
> > >> >>>>>> critical. And there is a risk that page replacement algorithm
> can
> > >> >>> affect
> > >> >>>>>> workloads, where page replacement is not used (enough RAM to
> > store
> > >> all
> > >> >>>>>> data). Of course, any page replacement solution should be
> > carefully
> > >> >>>>>> benchmarked.
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> Igniters, WDYT? If any of these proposals look reasonable to
> > you, I
> > >> >>> will
> > >> >>>>>> create IEP and start implementation.
> > >> >>>>>>
> > >> >>>>>> Also, I have a draft implementation of system view to determine
> > >> how hot
> > >> >>>>> are
> > >> >>>>>> pages in page-memory [1]. I think it will be useful for any of
> > >> these
> > >> >>>>>> approaches (and even if we decide to left page replacement as
> > is).
> > >> >>>>>>
> > >> >>>>>> [1]:  https://issues.apache.org/jira/browse/IGNITE-13726
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> >
>

Reply via email to