[
https://issues.apache.org/jira/browse/IGNITE-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699710#comment-17699710
]
Ivan Bessonov edited comment on IGNITE-18475 at 3/14/23 7:28 AM:
-----------------------------------------------------------------
Now, some ideas.
Things that just feel right to do in order to at least make clusters
recoverable, even if with data loss:
# When there's a RAFT snapshot taking place, log must be synced.
# When storage performs its own independent flush of data, log also must be
synced. Data on the storage cannot be in a "fresher" state, comparing to latest
log entry.
# When there's a RAFT configuration update, and RAFT meta is stored to the
storage, the log with corresponding configuration update entry must be synced.
Other things that may be right to do:
# Fix IGNITE-15568, it may give us significant boost in terms of throughput on
decent loads. I'd give this the highest priority.
# If anything, TX state updates should probably be synced, but I'm not sure
yet.
# TX commits should probably be synced. These two questions should be
carefully examined by folks who understand transactional protocol better than I
do.
How to deal with problems after restart, if sync was disabled:
# First of all, detection. Maybe we should propagate last applied index and
term for every group while joining, and have a manual reconciliation procedure
or manual leader establishment, choosing the node with highest index value.
# Second, depending on what we do choose in option 1, it's either manual
transfer of log entries, or relying on RAFT's implementation of the same
process.
was (Author: ibessonov):
Now, some ideas.
Things that just feel right to do in order to at least make clusters
recoverable, even if with data loss:
# When there's a RAFT snapshot taking place, log must be synced.
# When storage performs its own independent flush of data, log also must be
synced. Data on the storage cannot be in a "fresher" state, comparing to latest
log entry.
# When there's a RAFT configuration update, and RAFT meta is stored to the
storage, the log with corresponding configuration update entry must be synced.
Other things that may be right to do:
# tbd
How to deal with problems after restart, if sync was disabled:
# tbd
> Huge performance drop with enabled sync write per log entry for RAFT logs
> -------------------------------------------------------------------------
>
> Key: IGNITE-18475
> URL: https://issues.apache.org/jira/browse/IGNITE-18475
> Project: Ignite
> Issue Type: Task
> Reporter: Kirill Gusakov
> Assignee: Ivan Bessonov
> Priority: Major
> Labels: ignite-3
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> During the YCSB benchmark runs for ignite-3 beta1 we found out, that we have
> significant issues with performance for select/insert queries.
> One of the root cause of these issues: write every log entry to rocksdb with
> enabled sync option (which leads to frequent fsync calls).
> These issues can be reproduced by localised jmh benchmarks
> [SelectBenchmark|https://github.com/gridgain/apache-ignite-3/blob/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark/SelectBenchmark.java#L39]
> and
> [InsertBenchmark|https://github.com/gridgain/apache-ignite-3/blob/4b9de922caa4aef97a5e8e159d5db76a3fc7a3ad/modules/runner/src/test/java/org/apache/ignite/internal/benchmark/InsertBenchmark.java#L29]
> with RaftOptions.sync=true/false:
> * jdbc select queries: 115ms vs 4ms
> * jdbc insert queries: 70ms vs 2.5ms
> (These results received on MacBook Pro (16-inch, 2019) and it looks like
> macOS has slow fsync command in general, but runs on Ubuntu shows the huge
> different also (~26 times for insert test). So, your environment can show
> another, but still huge difference.)
> Why select queries suffers from syncs even more, than inserts, described in
> https://issues.apache.org/jira/browse/IGNITE-18474.
> Possible solutions for the issue:
> * Doesn't sync every raft record in rocksdb by default, but it can break the
> raft invariants
> * Investigate the inner parts of RocksDB (according syscall tracing, not
> every write with sync produce fsync syscall), maybe another strategies wll be
> suitable for our cases
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)