[jira] [Comment Edited] (IGNITE-23550) Test and optimize metastorage snapshot transfer and recovery speed for new nodes

Kirill Tkalenko (Jira) Sat, 07 Dec 2024 02:53:41 -0800


    [ 
https://issues.apache.org/jira/browse/IGNITE-23550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903806#comment-17903806
 ]


Kirill Tkalenko edited comment on IGNITE-23550 at 12/7/24 10:52 AM:
--------------------------------------------------------------------

Based on the results of running tests from [PR 
4845|https://github.com/apache/ignite-3/pull/4845] locally and TС I made 
several conclusions.

As the analysis of jfr has shown, we spend quite a lot of time saving the 
checksum on each write to the metastorage since it is saved in the synchronous 
mode to a WAL.

Results of executing *MetaStorageManager#put* 100k times with/without the sync 
mode of the checksum:
||Disable sync for checsum||TC/Local||Time||
|True|TC|4s 12ms 204us 192866ns, totalMs=4012, totalNs=4012204866|
|False|TC|22s 12ms 732us 720751ns, totalMs=22012, totalNs=22012732751|
|True|Local|2s 218ms 965us 747459ns, totalMs=2218, totalNs=2218965459|
|False|Local|5m 10s 157ms 117us, totalMs=310157, totalNs=310157117417|
>From the table we can conclude that disabling sync with WAL for the checksum 
>will increase our performance several times. How can we optimize it? I think 
>we can invasively infiltrate the mechanism of sending raft commands and write 
>the required checksum as an additional field and then work with it since raft 
>commands are already written in a sync mode with WAL.

Result node restart with/without the sync mode of the checksum with 100k put in 
raft log:
||Disable sync for checsum||TC/Local||Time||
|True|TC|4s 744ms 454us, totalMs=4744, totalNs=4744454870|
|False|TC|24s 195ms 34us, totalMs=24195, totalNs=24195034186|
|True|Local|2s 877ms 511us, totalMs=2877, totalNs=2877511875|
|False|Local|6m 41s 771ms 593us, totalMs=401771, totalNs=401771593084|
This table also shows that there will be a performance gain if we fix the 
situation with the checksum sync mode.

Let's look at the results of restarting the node if we take a snapshot before 
or not:
||TC/Local||Disable sync for checsum||Snapshot before restart||Time||
|TC|True|True|1s 890ms 270us, totalMs=1890, totalNs=1890270734|
|TC|True|False|4s 744ms 454us, totalMs=4744, totalNs=4744454870|
|TC|False|True|1s 836ms 376us, totalMs=1836, totalNs=1836376733|
|TC|False|False|24s 195ms 34us, totalMs=24195, totalNs=24195034186|
|Local|True|True|955ms 622us, totalMs=955, totalNs=955622750|
|Local|True|False|2s 877ms 511us, totalMs=2877, totalNs=2877511875|
|Local|False|True|804ms 878us 74708ns, totalMs=804, totalNs=804878708|
|Local|False|False|6m 41s 771ms 593us, totalMs=401771, totalNs=401771593084|
It can be concluded that taking a snapshot before restarting a node gives good 
performance for starting the node.

See how long it takes to take snapshots for a 200MB storage:
||TC/Local||do/restore snapshot||Time||
|TC|do|1s 911ms 127us, totalMs=1911, totalNs=1911127356|
|TC|restore|179ms 976us 797710ns, totalMs=179, totalNs=179976710|
|Local|do|799ms 362us, totalMs=799, totalNs=799362375|
|Local|restore|105ms 686us 581542ns, totalMs=105, totalNs=105686542|
It can be concluded that creating a snapshot and restoring from it can provide 
a performance boost. But we will make a mark that there is no parallel load on 
the nodes.


was (Author: [email protected]):
Based on the results of running tests from [PR 
4845|https://github.com/apache/ignite-3/pull/4845] locally and TС I made 
several conclusions.

As the analysis of jfr has shown, we spend quite a lot of time saving the 
checksum on each write to the metastorage since it is saved in the synchronous 
mode to a WAL.

Results of executing *MetaStorageManager#put* 100k times with/without the sync 
mode of the checksum:
||Disable sync for checsum||TC/Local||Time||
|True|TC|4s 12ms 204us 192866ns, totalMs=4012, totalNs=4012204866|
|False|TC|22s 12ms 732us 720751ns, totalMs=22012, totalNs=22012732751|
|True|Local|2s 218ms 965us 747459ns, totalMs=2218, totalNs=2218965459|
|False|Local|5m 10s 157ms 117us, totalMs=310157, totalNs=310157117417|
>From the table we can conclude that disabling sync with WAL for the checksum 
>will increase our performance several times. How can we optimize it? I think 
>we can invasively infiltrate the mechanism of sending raft commands and write 
>the required checksum as an additional field and then work with it since raft 
>commands are already written in a sync mode with WAL.

Result node restart with/without the sync mode of the checksum with 100k put in 
raft log:
||Disable sync for checsum||TC/Local||Time||
|True|TC|4s 744ms 454us, totalMs=4744, totalNs=4744454870|
|False|TC|24s 195ms 34us, totalMs=24195, totalNs=24195034186|
|True|Local|2s 877ms 511us, totalMs=2877, totalNs=2877511875|
|False|Local|6m 41s 771ms 593us, totalMs=401771, totalNs=401771593084|
This table also shows that there will be a performance gain if we fix the 
situation with the checksum sync mode.

Let's look at the results of restarting the node if we take a snapshot before 
or not:
||TC/Local||Disable sync for checsum|||Snapshot before restart|Time||
|TC|True|True|1s 890ms 270us, totalMs=1890, totalNs=1890270734|
|TC|True|False|4s 744ms 454us, totalMs=4744, totalNs=4744454870|
|TC|False|True|1s 836ms 376us, totalMs=1836, totalNs=1836376733|
|TC|False|False|24s 195ms 34us, totalMs=24195, totalNs=24195034186|
|Local|True|True|955ms 622us, totalMs=955, totalNs=955622750|
|Local|True|False|2s 877ms 511us, totalMs=2877, totalNs=2877511875|
|Local|False|True|804ms 878us 74708ns, totalMs=804, totalNs=804878708|
|Local|False|False|6m 41s 771ms 593us, totalMs=401771, totalNs=401771593084|
It can be concluded that taking a snapshot before restarting a node gives good 
performance for starting the node.



> Test and optimize metastorage snapshot transfer and recovery speed for new 
> nodes
> --------------------------------------------------------------------------------
>
>                 Key: IGNITE-23550
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23550
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Assignee: Kirill Tkalenko
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Test and optimize metastorage snapshot transfer and recovery speed for new 
> nodes.
> Let's assume that we have a 100Mb+ meta-storage snapshot and 100k+ entries in 
> raft log replicated as log.
> How long would it take for a new node to join the cluster under these 
> conditions? Will something break? What can we do to make it work?
> Goal is - the joining process should work for a long-running clusters. It 
> should be pretty fast as well. Less than 10 seconds for sure, of course 
> depending on the network capabilities. No timeout errors should occur if it 
> takes more than 10 seconds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (IGNITE-23550) Test and optimize metastorage snapshot transfer and recovery speed for new nodes

Reply via email to