Stas hello! I didn't quite get your last idea. What will we do if we reach getMaxWalArchiveSize? Shall we not delete the segment until minWalArchiveTimespan?
06.05.2021, 20:00, "Stanislav Lukyanov" <stanlukya...@gmail.com>: > An interesting suggestion I heard today. > > The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. > be a number of seconds instead of a number of bytes! > > I think this makes perfect sense from the user point of view. > "I want to have WAL archive for at least N hours but I have a limit of M > gigabytes to store it". > > Do we have checkpoint timestamp stored anywhere? (cp start markers?) > Perhaps we can actually implement this? > > Thanks, > Stan > >> On 6 May 2021, at 14:13, Stanislav Lukyanov <stanlukya...@gmail.com> wrote: >> >> +1 to cancel WAL reservation on reaching getMaxWalArchiveSize >> +1 to add a public property to replace >> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE >> >> I don't like the name getWalArchiveSize - I think it's a bit confusing (is >> it the current size? the minimal size? the target size?) >> I suggest to name the property geMintWalArchiveSize. I think that this is >> exactly what it is - the minimal size of the archive that we want to have. >> The archive size at all times should be between min and max. >> If archive size is less than min or more than max then the system >> functionality can degrade (e.g. historical rebalance may not work as >> expected). >> I think these rules are intuitively understood from the "min" and "max" >> names. >> >> Ilya's suggestion about throttling is great although I'd do this in a >> different ticket. >> >> Thanks, >> Stan >> >>> On 5 May 2021, at 19:25, Maxim Muzafarov <mmu...@apache.org> wrote: >>> >>> Hello, Kirill >>> >>> +1 for this change, however, there are too many configuration settings >>> that exist for the user to configure Ignite cluster. It is better to >>> keep the options that we already have and fix the behaviour of the >>> rebalance process as you suggested. >>> >>> On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tkalkir...@yandex.ru> wrote: >>>> Hi Ilya! >>>> >>>> Then we can greatly reduce the user load on the cluster until the >>>> rebalance is over. Which can be critical for the user. >>>> >>>> 04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnach...@gmail.com>: >>>>> Hello! >>>>> >>>>> Maybe we can have a mechanic here similar (or equal) to checkpoint based >>>>> write throttling? >>>>> >>>>> So we will be throttling for both checkpoint page buffer and WAL limit. >>>>> >>>>> Regards, >>>>> -- >>>>> Ilya Kasnacheev >>>>> >>>>> вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkir...@yandex.ru>: >>>>> >>>>>> Hello everybody! >>>>>> >>>>>> At the moment, if there are partitions for the rebalance for which the >>>>>> historical rebalance will be used, then we reserve segments in the WAL >>>>>> archive (we do not allow cleaning the WAL archive) until the rebalance >>>>>> for >>>>>> all cache groups is over. >>>>>> >>>>>> If a cluster is under load during the rebalance, WAL archive size may >>>>>> significantly exceed limits set in >>>>>> DataStorageConfiguration#getMaxWalArchiveSize until the process is >>>>>> complete. This may lead to user issues and nodes may crash with the "No >>>>>> space left on device" error. >>>>>> >>>>>> We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE >>>>>> by >>>>>> default 0.5, which sets the threshold (multiplied by >>>>>> getMaxWalArchiveSize) >>>>>> from which and up to which the WAL archive will be cleared, i.e. sets >>>>>> the >>>>>> size of the WAL archive that will always be on the node. I propose to >>>>>> replace this system property with the >>>>>> DataStorageConfiguration#getWalArchiveSize in bytes, the default is >>>>>> (getMaxWalArchiveSize * 0.5) as it is now. >>>>>> >>>>>> Main proposal: >>>>>> When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel >>>>>> and do not give the reservation of the WAL segments until we reach >>>>>> DataStorageConfiguration#getWalArchiveSize. In this case, if there is no >>>>>> segment for historical rebalance, we will automatically switch to full >>>>>> rebalance.