Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

ткаленко кирилл Fri, 07 May 2021 05:09:25 -0700

Stas hello!

I didn't quite get your last idea. 
What will we do if we reach getMaxWalArchiveSize? Shall we not delete the 
segment until minWalArchiveTimespan?


06.05.2021, 20:00, "Stanislav Lukyanov" <stanlukya...@gmail.com>:
> An interesting suggestion I heard today.
>
> The minWalArchiveSize property might actually be minWalArchiveTimespan - i.e. 
> be a number of seconds instead of a number of bytes!
>
> I think this makes perfect sense from the user point of view.
> "I want to have WAL archive for at least N hours but I have a limit of M 
> gigabytes to store it".
>
> Do we have checkpoint timestamp stored anywhere? (cp start markers?)
> Perhaps we can actually implement this?
>
> Thanks,
> Stan
>
>>  On 6 May 2021, at 14:13, Stanislav Lukyanov <stanlukya...@gmail.com> wrote:
>>
>>  +1 to cancel WAL reservation on reaching getMaxWalArchiveSize
>>  +1 to add a public property to replace 
>> IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE
>>
>>  I don't like the name getWalArchiveSize - I think it's a bit confusing (is 
>> it the current size? the minimal size? the target size?)
>>  I suggest to name the property geMintWalArchiveSize. I think that this is 
>> exactly what it is - the minimal size of the archive that we want to have.
>>  The archive size at all times should be between min and max.
>>  If archive size is less than min or more than max then the system 
>> functionality can degrade (e.g. historical rebalance may not work as 
>> expected).
>>  I think these rules are intuitively understood from the "min" and "max" 
>> names.
>>
>>  Ilya's suggestion about throttling is great although I'd do this in a 
>> different ticket.
>>
>>  Thanks,
>>  Stan
>>
>>>  On 5 May 2021, at 19:25, Maxim Muzafarov <mmu...@apache.org> wrote:
>>>
>>>  Hello, Kirill
>>>
>>>  +1 for this change, however, there are too many configuration settings
>>>  that exist for the user to configure Ignite cluster. It is better to
>>>  keep the options that we already have and fix the behaviour of the
>>>  rebalance process as you suggested.
>>>
>>>  On Tue, 4 May 2021 at 19:01, ткаленко кирилл <tkalkir...@yandex.ru> wrote:
>>>>  Hi Ilya!
>>>>
>>>>  Then we can greatly reduce the user load on the cluster until the 
>>>> rebalance is over. Which can be critical for the user.
>>>>
>>>>  04.05.2021, 18:43, "Ilya Kasnacheev" <ilya.kasnach...@gmail.com>:
>>>>>  Hello!
>>>>>
>>>>>  Maybe we can have a mechanic here similar (or equal) to checkpoint based
>>>>>  write throttling?
>>>>>
>>>>>  So we will be throttling for both checkpoint page buffer and WAL limit.
>>>>>
>>>>>  Regards,
>>>>>  --
>>>>>  Ilya Kasnacheev
>>>>>
>>>>>  вт, 4 мая 2021 г. в 11:29, ткаленко кирилл <tkalkir...@yandex.ru>:
>>>>>
>>>>>>  Hello everybody!
>>>>>>
>>>>>>  At the moment, if there are partitions for the rebalance for which the
>>>>>>  historical rebalance will be used, then we reserve segments in the WAL
>>>>>>  archive (we do not allow cleaning the WAL archive) until the rebalance 
>>>>>> for
>>>>>>  all cache groups is over.
>>>>>>
>>>>>>  If a cluster is under load during the rebalance, WAL archive size may
>>>>>>  significantly exceed limits set in
>>>>>>  DataStorageConfiguration#getMaxWalArchiveSize until the process is
>>>>>>  complete. This may lead to user issues and nodes may crash with the "No
>>>>>>  space left on device" error.
>>>>>>
>>>>>>  We have a system property IGNITE_THRESHOLD_WAL_ARCHIVE_SIZE_PERCENTAGE 
>>>>>> by
>>>>>>  default 0.5, which sets the threshold (multiplied by 
>>>>>> getMaxWalArchiveSize)
>>>>>>  from which and up to which the WAL archive will be cleared, i.e. sets 
>>>>>> the
>>>>>>  size of the WAL archive that will always be on the node. I propose to
>>>>>>  replace this system property with the
>>>>>>  DataStorageConfiguration#getWalArchiveSize in bytes, the default is
>>>>>>  (getMaxWalArchiveSize * 0.5) as it is now.
>>>>>>
>>>>>>  Main proposal:
>>>>>>  When theDataStorageConfiguration#getMaxWalArchiveSize is reached, cancel
>>>>>>  and do not give the reservation of the WAL segments until we reach
>>>>>>  DataStorageConfiguration#getWalArchiveSize. In this case, if there is no
>>>>>>  segment for historical rebalance, we will automatically switch to full
>>>>>>  rebalance.

Re: Exceeding the DataStorageConfiguration#getMaxWalArchiveSize due to historical rebalance

Reply via email to