[
https://issues.apache.org/jira/browse/IGNITE-23550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903806#comment-17903806
]
Kirill Tkalenko edited comment on IGNITE-23550 at 12/7/24 10:24 AM:
--------------------------------------------------------------------
Based on the results of running tests from [PR
4845|https://github.com/apache/ignite-3/pull/4845] locally and TС I made
several conclusions.
As the analysis of jfr has shown, we spend quite a lot of time saving the
checksum on each write to the metastorage since it is saved in the synchronous
mode to a WAL.
Results of executing *MetaStorageManager#put* 100k times with/without the
synchronous write mode of the checksum:
||Disable sync for checsum||TC/Local||Time||
|True|TC|4s 12ms 204us 192866ns, totalMs=4012, totalNs=4012204866|
|False|TC|22s 12ms 732us 720751ns, totalMs=22012, totalNs=22012732751|
|True|Local|2s 218ms 965us 747459ns, totalMs=2218, totalNs=2218965459|
|False|Local|5m 10s 157ms 117us, totalMs=310157, totalNs=310157117417|
>From the table we can conclude that disabling sync with WAL for the checksum
>will increase our performance several times. How can we optimize it? I think
>we can invasively infiltrate the mechanism of sending raft commands and write
>the required checksum as an additional field and then work with it since raft
>commands are already written in a sync mode with WAL.
was (Author: [email protected]):
Based on the results of running tests from [PR
4845|https://github.com/apache/ignite-3/pull/4845] locally and TС I made
several conclusions.
As the analysis of jfr has shown, we spend quite a lot of time saving the
checksum on each write to the metastorage since it is saved in the synchronous
mode to a WAL.
Results of executing *MetaStorageManager#put* 100k times with/without the
synchronous write mode of the checksum:
||Disable sync for checsum||TC/Local||Time||
|True|TC|4s 12ms 204us 192866ns, totalMs=4012, totalNs=4012204866|
|False|TC|22s 12ms 732us 720751ns, totalMs=22012, totalNs=22012732751|
|True|Local|2s 218ms 965us 747459ns, totalMs=2218, totalNs=2218965459|
|False|Local|5m 10s 157ms 117us, totalMs=310157, totalNs=310157117417|
> Test and optimize metastorage snapshot transfer and recovery speed for new
> nodes
> --------------------------------------------------------------------------------
>
> Key: IGNITE-23550
> URL: https://issues.apache.org/jira/browse/IGNITE-23550
> Project: Ignite
> Issue Type: Improvement
> Reporter: Ivan Bessonov
> Assignee: Kirill Tkalenko
> Priority: Major
> Labels: ignite-3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Test and optimize metastorage snapshot transfer and recovery speed for new
> nodes.
> Let's assume that we have a 100Mb+ meta-storage snapshot and 100k+ entries in
> raft log replicated as log.
> How long would it take for a new node to join the cluster under these
> conditions? Will something break? What can we do to make it work?
> Goal is - the joining process should work for a long-running clusters. It
> should be pretty fast as well. Less than 10 seconds for sure, of course
> depending on the network capabilities. No timeout errors should occur if it
> takes more than 10 seconds.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)