[jira] [Updated] (IGNITE-22759) Do not do partition SafeTime sync if previous attempt is not finished

Roman Puchkovskiy (Jira) Wed, 17 Jul 2024 03:45:04 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-22759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Roman Puchkovskiy updated IGNITE-22759:
---------------------------------------
    Description: 
There is a scheduled task that, periodically, does 'partition SafeTime sync' on 
each primary replicas living on the node. For each such a replica, we do the 
following:
 # Take current time from the  node clock ('now')
 # Wait till the Metastorage SafeTime reaches 'now'
 # Make sure the replica is still primary
 # Execute the partition SafeTime sync logic

Step 2 is implemented by installing a future to a 
PendingComparableValuesTracker representing the Metastorage SafeTime. If, for 
some reason, Metastorage SafeTime lags behind the node clock, a few (or many) 
futures might be installed at the same time for the same partition. When there 
are many partitions, this leads to huge number of futures, most of which are 
useless (just one [the most recent] of them makes sense for each partition). 
This increases the amount of garbage. If the node is already struggling to chew 
the load, this will finish the node off as it will increase the GC pressure 
drastically. The node will choke itself to OutOfMemory situation.

It is suggested to only execute steps 1-4 if previous future has already 
finished. We might lose one partition SafeTime update, but in a situation when 
the node is already struggling (as Metastorage SafeTime lags) this will 
probably not be noticed.

> Do not do partition SafeTime sync if previous attempt is not finished
> ---------------------------------------------------------------------
>
>                 Key: IGNITE-22759
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22759
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>
> There is a scheduled task that, periodically, does 'partition SafeTime sync' 
> on each primary replicas living on the node. For each such a replica, we do 
> the following:
>  # Take current time from the  node clock ('now')
>  # Wait till the Metastorage SafeTime reaches 'now'
>  # Make sure the replica is still primary
>  # Execute the partition SafeTime sync logic
> Step 2 is implemented by installing a future to a 
> PendingComparableValuesTracker representing the Metastorage SafeTime. If, for 
> some reason, Metastorage SafeTime lags behind the node clock, a few (or many) 
> futures might be installed at the same time for the same partition. When 
> there are many partitions, this leads to huge number of futures, most of 
> which are useless (just one [the most recent] of them makes sense for each 
> partition). This increases the amount of garbage. If the node is already 
> struggling to chew the load, this will finish the node off as it will 
> increase the GC pressure drastically. The node will choke itself to 
> OutOfMemory situation.
> It is suggested to only execute steps 1-4 if previous future has already 
> finished. We might lose one partition SafeTime update, but in a situation 
> when the node is already struggling (as Metastorage SafeTime lags) this will 
> probably not be noticed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22759) Do not do partition SafeTime sync if previous attempt is not finished

Reply via email to