[
https://issues.apache.org/jira/browse/IGNITE-23550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903806#comment-17903806
]
Kirill Tkalenko edited comment on IGNITE-23550 at 12/7/24 10:52 AM:
--------------------------------------------------------------------
Based on the results of running tests from [PR
4845|https://github.com/apache/ignite-3/pull/4845] locally and TС I made
several conclusions.
As the analysis of jfr has shown, we spend quite a lot of time saving the
checksum on each write to the metastorage since it is saved in the synchronous
mode to a WAL.
Results of executing *MetaStorageManager#put* 100k times with/without the sync
mode of the checksum:
||Disable sync for checsum||TC/Local||Time||
|True|TC|4s 12ms 204us 192866ns, totalMs=4012, totalNs=4012204866|
|False|TC|22s 12ms 732us 720751ns, totalMs=22012, totalNs=22012732751|
|True|Local|2s 218ms 965us 747459ns, totalMs=2218, totalNs=2218965459|
|False|Local|5m 10s 157ms 117us, totalMs=310157, totalNs=310157117417|
>From the table we can conclude that disabling sync with WAL for the checksum
>will increase our performance several times. How can we optimize it? I think
>we can invasively infiltrate the mechanism of sending raft commands and write
>the required checksum as an additional field and then work with it since raft
>commands are already written in a sync mode with WAL.
Result node restart with/without the sync mode of the checksum with 100k put in
raft log:
||Disable sync for checsum||TC/Local||Time||
|True|TC|4s 744ms 454us, totalMs=4744, totalNs=4744454870|
|False|TC|24s 195ms 34us, totalMs=24195, totalNs=24195034186|
|True|Local|2s 877ms 511us, totalMs=2877, totalNs=2877511875|
|False|Local|6m 41s 771ms 593us, totalMs=401771, totalNs=401771593084|
This table also shows that there will be a performance gain if we fix the
situation with the checksum sync mode.
Let's look at the results of restarting the node if we take a snapshot before
or not:
||TC/Local||Disable sync for checsum||Snapshot before restart||Time||
|TC|True|True|1s 890ms 270us, totalMs=1890, totalNs=1890270734|
|TC|True|False|4s 744ms 454us, totalMs=4744, totalNs=4744454870|
|TC|False|True|1s 836ms 376us, totalMs=1836, totalNs=1836376733|
|TC|False|False|24s 195ms 34us, totalMs=24195, totalNs=24195034186|
|Local|True|True|955ms 622us, totalMs=955, totalNs=955622750|
|Local|True|False|2s 877ms 511us, totalMs=2877, totalNs=2877511875|
|Local|False|True|804ms 878us 74708ns, totalMs=804, totalNs=804878708|
|Local|False|False|6m 41s 771ms 593us, totalMs=401771, totalNs=401771593084|
It can be concluded that taking a snapshot before restarting a node gives good
performance for starting the node.
See how long it takes to take snapshots for a 200MB storage:
||TC/Local||do/restore snapshot||Time||
|TC|do|1s 911ms 127us, totalMs=1911, totalNs=1911127356|
|TC|restore|179ms 976us 797710ns, totalMs=179, totalNs=179976710|
|Local|do|799ms 362us, totalMs=799, totalNs=799362375|
|Local|restore|105ms 686us 581542ns, totalMs=105, totalNs=105686542|
It can be concluded that creating a snapshot and restoring from it can provide
a performance boost. But we will make a mark that there is no parallel load on
the nodes.
was (Author: [email protected]):
Based on the results of running tests from [PR
4845|https://github.com/apache/ignite-3/pull/4845] locally and TС I made
several conclusions.
As the analysis of jfr has shown, we spend quite a lot of time saving the
checksum on each write to the metastorage since it is saved in the synchronous
mode to a WAL.
Results of executing *MetaStorageManager#put* 100k times with/without the sync
mode of the checksum:
||Disable sync for checsum||TC/Local||Time||
|True|TC|4s 12ms 204us 192866ns, totalMs=4012, totalNs=4012204866|
|False|TC|22s 12ms 732us 720751ns, totalMs=22012, totalNs=22012732751|
|True|Local|2s 218ms 965us 747459ns, totalMs=2218, totalNs=2218965459|
|False|Local|5m 10s 157ms 117us, totalMs=310157, totalNs=310157117417|
>From the table we can conclude that disabling sync with WAL for the checksum
>will increase our performance several times. How can we optimize it? I think
>we can invasively infiltrate the mechanism of sending raft commands and write
>the required checksum as an additional field and then work with it since raft
>commands are already written in a sync mode with WAL.
Result node restart with/without the sync mode of the checksum with 100k put in
raft log:
||Disable sync for checsum||TC/Local||Time||
|True|TC|4s 744ms 454us, totalMs=4744, totalNs=4744454870|
|False|TC|24s 195ms 34us, totalMs=24195, totalNs=24195034186|
|True|Local|2s 877ms 511us, totalMs=2877, totalNs=2877511875|
|False|Local|6m 41s 771ms 593us, totalMs=401771, totalNs=401771593084|
This table also shows that there will be a performance gain if we fix the
situation with the checksum sync mode.
Let's look at the results of restarting the node if we take a snapshot before
or not:
||TC/Local||Disable sync for checsum|||Snapshot before restart|Time||
|TC|True|True|1s 890ms 270us, totalMs=1890, totalNs=1890270734|
|TC|True|False|4s 744ms 454us, totalMs=4744, totalNs=4744454870|
|TC|False|True|1s 836ms 376us, totalMs=1836, totalNs=1836376733|
|TC|False|False|24s 195ms 34us, totalMs=24195, totalNs=24195034186|
|Local|True|True|955ms 622us, totalMs=955, totalNs=955622750|
|Local|True|False|2s 877ms 511us, totalMs=2877, totalNs=2877511875|
|Local|False|True|804ms 878us 74708ns, totalMs=804, totalNs=804878708|
|Local|False|False|6m 41s 771ms 593us, totalMs=401771, totalNs=401771593084|
It can be concluded that taking a snapshot before restarting a node gives good
performance for starting the node.
> Test and optimize metastorage snapshot transfer and recovery speed for new
> nodes
> --------------------------------------------------------------------------------
>
> Key: IGNITE-23550
> URL: https://issues.apache.org/jira/browse/IGNITE-23550
> Project: Ignite
> Issue Type: Improvement
> Reporter: Ivan Bessonov
> Assignee: Kirill Tkalenko
> Priority: Major
> Labels: ignite-3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Test and optimize metastorage snapshot transfer and recovery speed for new
> nodes.
> Let's assume that we have a 100Mb+ meta-storage snapshot and 100k+ entries in
> raft log replicated as log.
> How long would it take for a new node to join the cluster under these
> conditions? Will something break? What can we do to make it work?
> Goal is - the joining process should work for a long-running clusters. It
> should be pretty fast as well. Less than 10 seconds for sure, of course
> depending on the network capabilities. No timeout errors should occur if it
> takes more than 10 seconds.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)