[ 
https://issues.apache.org/jira/browse/IGNITE-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699710#comment-17699710
 ] 

Ivan Bessonov edited comment on IGNITE-18475 at 3/14/23 7:28 AM:
-----------------------------------------------------------------

Now, some ideas.

Things that just feel right to do in order to at least make clusters 
recoverable, even if with data loss:
 # When there's a RAFT snapshot taking place, log must be synced.
 # When storage performs its own independent flush of data, log also must be 
synced. Data on the storage cannot be in a "fresher" state, comparing to latest 
log entry.
 # When there's a RAFT configuration update, and RAFT meta is stored to the 
storage, the log with corresponding configuration update entry must be synced.

Other things that may be right to do:
 # Fix IGNITE-15568, it may give us significant boost in terms of throughput on 
decent loads. I'd give this the highest priority.
 # If anything, TX state updates should probably be synced, but I'm not sure 
yet.
 # TX commits should probably be synced. These two questions should be 
carefully examined by folks who understand transactional protocol better than I 
do.

How to deal with problems after restart, if sync was disabled:
 # First of all, detection. Maybe we should propagate last applied index and 
term for every group while joining, and have a manual reconciliation procedure 
or manual leader establishment, choosing the node with highest index value.
 # Second, depending on what we do choose in option 1, it's either manual 
transfer of log entries, or relying on RAFT's implementation of the same 
process.

 


was (Author: ibessonov):
Now, some ideas.

Things that just feel right to do in order to at least make clusters 
recoverable, even if with data loss:
 # When there's a RAFT snapshot taking place, log must be synced.
 # When storage performs its own independent flush of data, log also must be 
synced. Data on the storage cannot be in a "fresher" state, comparing to latest 
log entry.
 # When there's a RAFT configuration update, and RAFT meta is stored to the 
storage, the log with corresponding configuration update entry must be synced.

Other things that may be right to do:
 # tbd

How to deal with problems after restart, if sync was disabled:
 # tbd

 

> Huge performance drop with enabled sync write per log entry for RAFT logs
> -------------------------------------------------------------------------
>
>                 Key: IGNITE-18475
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18475
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Kirill Gusakov
>            Assignee: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> During the YCSB benchmark runs for ignite-3 beta1 we found out, that we have 
> significant issues with performance for select/insert queries.
> One of the root cause of these issues: write every log entry to rocksdb with 
> enabled sync option (which leads to frequent fsync calls).
> These issues can be reproduced by localised jmh benchmarks 
> [SelectBenchmark|https://github.com/gridgain/apache-ignite-3/blob/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark/SelectBenchmark.java#L39]
>  and 
> [InsertBenchmark|https://github.com/gridgain/apache-ignite-3/blob/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark/InsertBenchmark.java#L29]
>  with RaftOptions.sync=true/false:
>  * jdbc select queries: 115ms vs 4ms
>  * jdbc insert queries: 70ms vs 2.5ms
> (These results received on MacBook Pro (16-inch, 2019) and it looks like 
> macOS has slow fsync command in general, but runs on Ubuntu shows the huge 
> different also (~26 times for insert test). So, your environment can show 
> another, but still huge difference.)
> Why select queries suffers from syncs even more, than inserts, described in 
> https://issues.apache.org/jira/browse/IGNITE-18474.
> Possible solutions for the issue:
>  * Doesn't sync every raft record in rocksdb by default, but it can break the 
> raft invariants
>  * Investigate the inner parts of RocksDB (according syscall tracing, not 
> every write with sync produce fsync syscall), maybe another strategies wll be 
> suitable for our cases
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to