Hello, Alexey! Thank you for your effort, the feature looks great!
Could you please give a bit more time to review it? Best regards, Mirza Aliev пт, 22 янв. 2021 г. в 10:19, Alex Plehanov <plehanov.a...@gmail.com>: > Hello, Igniters! > > Zhenya has already approved the patch that adds segmented-LRU and CLOCK > page replacement modes [1] (thanks for the review!). Would anyone else like > to review it? > I have plans to merge it next Monday if nobody else will be interested in > reviewing it. > > [1]: https://issues.apache.org/jira/browse/IGNITE-13761 > > пт, 25 дек. 2020 г. в 12:26, Alex Plehanov <plehanov.a...@gmail.com>: > > > Guys, > > > > I've implemented Segmented-LRU page replacement algorithm and benchmarked > > results, it gives some boost (5-10%) when page replacement is > > heavily used, but, unfortunately, when replacement is not used it gives > > about 2% drop compared to the current Random-LRU page replacement > > implementation. Due to this, I think Segmented-LRU can't be used as the > > only option or option by default. > > > > Also, I've implemented CLOCK page replacement algorithm (basic, > > not scan-resistant version) and benchmark results are much better. It > gives > > about the same performance as Segmented-LRU when page replacement is used > > and about the same performance as Random-LRU where there is no page > > replacement. There are advanced modifications of CLOCK algorithm exists, > > but usually, they require additional maintenance cost and we can again > get > > drop on environments without page replacements compared to Random-LRU. > I've > > written a benchmark with background full cache scans and even in this > case > > basic CLOCK version looks promising. > > > > So, my proposals: > > 1. Include all 3 implementations (Random-LRU, Segmented-LRU, CLOCK) to > the > > product. > > 2. Make page replacement algorithm configurable. > > 3. Recommend to use Random-LRU for environments with no page replacements > > (as it has zero maintenance cost). Recommend to use Segmented-LRU for > > environments with a high page replacement rate and a large amount of > > one-time scans (as it has near to optimal page to replace selection > policy > > and scan-resistant). Recommend to use CLOCK for all other cases (as it > has > > near to zero maintenance cost and efficiency of page replacement near to > > Segmented-LRU). > > 4. Set CLOCK as the default page replacement algorithm. > > > > WDYT? > > > > I've filled the IEP [1] for this discussion and create the pull request > > [2] for the last proposal. I would appreciate for review of the patch. > > > > [1]: > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-62+Page+replacement+improvements > > [2]: https://github.com/apache/ignite/pull/8513 > > > > пн, 23 нояб. 2020 г. в 11:12, Zhenya Stanilovsky > > <arzamas...@mail.ru.invalid>: > > > >> > >> > >> Nikolay, i hope such case ignite users already observed) > >> I suggest to: put data bigger then available, full scan, get data only > >> for available inmem data in loop, check PageReplacement metric, how > match > >> iterations will bring to zero this metric. > >> > >> >Hello, Alex. > >> > > >> >> Perhaps we can implement a special benchmark for this case with > >> manually triggered "batch page replacement" using yardstick (but I'm not > >> sure if ducktape can help here). > >> > > >> >I think we should carefully describe the issues with the current > >> approach and then we can choose right tool to benchmark improvements. > >> >You said: > >> > > >> >> we use Random-LRU algorithm … it has many disadvantages and affects > >> performance very much when replacement is started > >> > > >> >Can you list disadvantages of the Random-LRU? > >> > > >> >AFAIU the first benchmark should be: > >> > > >> >1. Start cluster with persistence and put data bigger then available > RAM > >> to it. > >> >2. Measure performance of the queries that selects one entry from the > >> cache. > >> >3. Make some queries that will iterate throw all data - `SELECT SUM(x) > >> FROM t` or something similar. > >> >4. Repeat step 2. in this time performance of random queries should be > >> lower due to the page replacement. > >> > > >> >Is this scenario correct? > >> > > >> >> 23 нояб. 2020 г., в 09:12, Alex Plehanov < plehanov.a...@gmail.com > > >> написал(а): > >> >> > >> >> Nikolay, Zhenya, > >> >> > >> >> Benchmark from IGNITE-13034 is fully synthetic, it makes random puts > >> >> uniformly. It can be used to compare different page replacement > >> algorithms > >> >> (random-LRU vs segmented-LRU), but it is totally inapplicable to > >> benchmark > >> >> batch page replacement. > >> >> Perhaps we can implement a special benchmark for this case with > >> manually > >> >> triggered "batch page replacement" using yardstick (but I'm not sure > >> >> if ducktape can help here). > >> >> > >> >>> I understand case you described, but who will pull the switch ? > Human, > >> >> artificial intelligence ? > >> >> As I described before, we can implement both manual and automatic > >> "batch > >> >> page replacement" triggering. For automatic triggering, there is no > >> >> artificial intelligence needed, just several conditions with > reasonable > >> >> thresholds. Automatic triggering also can be disabled by default. > >> >> > >> >> пт, 20 нояб. 2020 г. в 13:32, Zhenya Stanilovsky < > >> arzamas...@mail.ru.invalid > >> >>> : > >> >> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>>> Zhenya, > >> >>>> > >> >>>>> Alexey, we already have changes that partially fixes this issue > [1] > >> >>>> IGNITE-13086 it's a minor improvement. We still have major problems > >> with > >> >>>> our page replacement algorithm (slow page selection and non-optimal > >> >>>> page-fault rate). I think changing from random 5 pages to 7 will > make > >> >>>> things even worse (it's better for page-fault rate, but page > >> selection > >> >>> will > >> >>>> be slower). > >> >>> All this words above need to be proven, i hope. + 1 with Nikolay, we > >> need > >> >>> correct reproduces or some graphs from 2.9 ver. > >> >>> > >> >>>> > >> >>>>> This approach still not applicable for real life > >> >>>> Why do you think batch replacement is not applicable for real-life? > >> It can > >> >>>> be applied for workloads, where some big amount of data > periodically > >> used, > >> >>>> but not very often. For example, when OLAP request over historical > >> data > >> >>>> raised pages to page-memory, and after such request this data is > not > >> >>> needed > >> >>>> for a long time. Or when OLTP transactions mostly add new data and > >> process > >> >>>> recent data but rarely touch historical data. In these cases with > the > >> >>>> current approach, we will enter "page replacement mode" after some > >> period > >> >>>> of time and never leave it. With batch page replacement there is a > >> chance > >> >>>> to prevent random-LRU page replacement or postpone it. > >> >>> I understand case you described, but who will pull the switch ? > Human, > >> >>> artificial intelligence ? > >> >>> You approach assume some triggering from inner, i don`t like this. > >> >>> > >> >>>> > >> >>>>> But request once more, do you really observe such problems with > 2.9 > >> ver > >> >>> ? > >> >>>> Any graphs maybe ? > >> >>>> I don't have production usage feedback after IGNITE-13086, but I > >> doubt > >> >>>> something changed significantly. > >> >>> > >> >>> Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes > >> yardstik > >> >>> bench for PR proven, we can use it once more. > >> >>> > >> >>> Thanks ! > >> >>>> > >> >>>> > >> >>>> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky < > >> >>> arzamas...@mail.ru.invalid > >> >>>>> : > >> >>>> > >> >>>>> > >> >>>>> Alexey, we already have changes that partially fixes this issue > [1] > >> >>>>> Easy way: > >> >>>>> Looks like we already have converge in page replacement. > >> >>>>> If we change 5 times touch iterator from random lru algo into, for > >> >>>>> example — 7 we will obtain fast improvement from scratch. > >> >>>>> > >> >>>>> » Batch page replacement > >> >>>>> This approach still not applicable for real life if you wan`t to > >> observe > >> >>>>> ugly people for threshold (i.e. 12 h) interval. And, of course, > you > >> >>>>> understand that dramatically reduce of such interval gives > nothing? > >> >>>>> > >> >>>>> » Change the page replacement algorithm. > >> >>>>> That`s way i vote for ) But request once more, do you really > observe > >> >>> such > >> >>>>> problems with 2.9 ver ? Any graphs maybe ? > >> >>>>> > >> >>>>> thanks ! > >> >>>>> > >> >>>>> [1] https://issues.apache.org/jira/browse/IGNITE-13086 > >> >>>>>> Hello, Igniters! > >> >>>>>> > >> >>>>>> Currently, for page replacement (page rotation between > page-memory > >> and > >> >>>>>> disk) we use Random-LRU algorithm. It has a low maintenance cost > >> and > >> >>>>>> relatively simple implementation, but it has many disadvantages > and > >> >>>>> affects > >> >>>>>> performance very much when replacement is started. We even have > >> >>> warnings > >> >>>>> in > >> >>>>>> the log when page replacement started and a special event for > >> this. I > >> >>> know > >> >>>>>> Ignite deployments where administrators force to restart cluster > >> nodes > >> >>>>>> periodically to avoid page replacement. > >> >>>>>> > >> >>>>>> I have a couple of proposals to improve page replacement in > Ignite: > >> >>>>>> > >> >>>>>> *Batch page replacement.* > >> >>>>>> > >> >>>>>> Main idea: in some cases start background task to evict cold > pages > >> from > >> >>>>>> page-memory (for example, pages, last touched more than 12 hours > >> ago). > >> >>>>>> > >> >>>>>> The task can be started: > >> >>>>>> - Automatically, triggered by some events, for example, when we > >> expect > >> >>> a > >> >>>>>> start of Random-LRU page replacing soon (allocated more than 90% > of > >> >>>>>> page-memory) + we have enough amount of cold pages (we need some > >> >>> metric to > >> >>>>>> calculate the number of cold pages) + some time passed since last > >> batch > >> >>>>>> page replacement (to avoid too much resource consumption by > >> background > >> >>>>>> batch replacement). > >> >>>>>> - Manually (JMX or control.sh), if an administrator wants to > >> control > >> >>> the > >> >>>>>> time of batch replacement more precisely (for example, to avoid > the > >> >>> start > >> >>>>>> of this task during peak time). > >> >>>>>> > >> >>>>>> Batch page replacement will be helpful in some workloads (when > some > >> >>> data > >> >>>>>> much colder than another), it can prevent the starting of > >> Random-LRU > >> >>> page > >> >>>>>> replacement, or if Random-LRU already started it can provide > >> >>> conditions to > >> >>>>>> stop it. > >> >>>>>> > >> >>>>>> *Change the page replacement algorithm.* > >> >>>>>> > >> >>>>>> Good page replacement algorithm should satisfy the requirements: > >> >>>>>> - low page-fault rates for typical workload > >> >>>>>> - low maintenance cost (low resource consumption to maintain > >> additional > >> >>>>>> structures required for page replacement) > >> >>>>>> - fast searching of next page for replacement > >> >>>>>> - sequential scans resistance (one sequential scan should not > >> evict all > >> >>>>>> relatively hot pages from page-memory) > >> >>>>>> > >> >>>>>> Our Random-LRU has low maintenance cost and sequential scan > >> resistant, > >> >>> but > >> >>>>>> to find the next page for replacement in the best case we scan 5 > >> >>> pages, in > >> >>>>>> the worst case we can scan all data region segment. Also, due to > >> random > >> >>>>>> nature, it's not very effective in predicting the right page for > >> >>>>>> replacement to minimize the page-fault rate. And it's much time > >> >>> required > >> >>>>> to > >> >>>>>> totally evict old cold data. > >> >>>>>> > >> >>>>>> Usually, database management systems and operating systems use > >> >>>>>> modifications of LRU algorithms. These algorithms have higher > >> >>> maintenance > >> >>>>>> costs (pages list should be modified on each page access), but > >> often > >> >>> they > >> >>>>>> are effective from a "page-fault rate" point of view and have > O(1) > >> >>>>>> complexity for a searching page to replace. Simple LRU is not > >> >>> sequential > >> >>>>>> scan resistant, but modifications that utilize page access > >> frequency > >> >>> are > >> >>>>>> resistant to sequential scan. > >> >>>>>> > >> >>>>>> We can try one of the modifications of LRU as well (for example, > >> >>>>> "segmented > >> >>>>>> LRU" seems suitable for Ignite). > >> >>>>>> > >> >>>>>> Ignite is a memory-centric product, so "low maintenance cost" is > >> very > >> >>>>>> critical. And there is a risk that page replacement algorithm can > >> >>> affect > >> >>>>>> workloads, where page replacement is not used (enough RAM to > store > >> all > >> >>>>>> data). Of course, any page replacement solution should be > carefully > >> >>>>>> benchmarked. > >> >>>>>> > >> >>>>>> > >> >>>>>> Igniters, WDYT? If any of these proposals look reasonable to > you, I > >> >>> will > >> >>>>>> create IEP and start implementation. > >> >>>>>> > >> >>>>>> Also, I have a draft implementation of system view to determine > >> how hot > >> >>>>> are > >> >>>>>> pages in page-memory [1]. I think it will be useful for any of > >> these > >> >>>>>> approaches (and even if we decide to left page replacement as > is). > >> >>>>>> > >> >>>>>> [1]: https://issues.apache.org/jira/browse/IGNITE-13726 > >> >>>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>> > >> >>> > >> >>> > >> >>> > >> >>> > >> > > >> > >> > >> > >> > > > > >