[
https://issues.apache.org/jira/browse/IGNITE-16582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521073#comment-17521073
]
Roman Puchkovskiy commented on IGNITE-16582:
--------------------------------------------
I've tested the *New* code (the simpler sub-throttler is removed, the more
complex one is enabled) vs the *Old* code. There were 3 scenarios:
* Fast (checkpointing happens at full speed, puts happen at full speed
possible)
* Slow (checkointing speed is limited with approximately 300 pages/sec, which
is a lot slower than the pages are dirtied, puts happen at full speed possible)
* Saw (checkpointing speed is limited as in Slow scenario, but puts speed is
switch back and forth between full speed and zero speed, switching happens each
10 seconds)
Results follow:
# Old, Slow: 1890 put/sec, 120 sec/checkpoint
# New, Slow: 1857 put/sec, 245 sec/cp
# New, Slow (page write = page sync): 3412 put/sec, 133 sec/cp
# Old, Fast: 11040 put/sec, 9 sec/cp
# New, Fast: 16012 put/sec, 6 sec/cp
# Old, Saw: 1769 put/sec, 105 sec/cp
# New, Saw: 1814 put/sec, 243 sec/cp
# Old Slow (1hr): 1310 put/sec, 54 sec/cp (4/63), max dirty: 73%
# New Slow (1hr): 2366 put/sec, 89 sec/cp (22/18), max dirty: 53%
All the runs but 2 last ones took 30 minutes; 2 last runs (8 and 9) are 1 hour
each.
It can be seen that average checkpoint duration has increased with the new
code, but this is caused by the way the slowdown was produced: page writes were
slowed down, but pages syncs to disk where not, but checkpoint scheduling is
off. Result number 3 emulates equal write/sync durations, and it demonstrates
that the checkpoint duration is back to normal; so the checkpoint duration is
not a problem, it's just an artifact of the tests.
Other than that, the results demonstrate that the new code does not hurt
throughput, in some cases it improves significantly.
> Improve behavior of speed-based throttling when dirty pages ratio is low
> ------------------------------------------------------------------------
>
> Key: IGNITE-16582
> URL: https://issues.apache.org/jira/browse/IGNITE-16582
> Project: Ignite
> Issue Type: Improvement
> Components: persistence
> Affects Versions: 2.12
> Reporter: Roman Puchkovskiy
> Assignee: Roman Puchkovskiy
> Priority: Major
> Fix For: 2.14
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> There is a log:
> {{Throttling is applied to page modifications [}}
> percentOfPartTime=0.59,
> markDirty=7424 pages/sec,
> checkpointWrite=6268 pages/sec,
> estIdealMarkDirty=0 pages/sec,
> curDirty=0.00,
> maxDirty=0.24,
> avgParkTime=79770 ns,
> {{pages: (total=67085, evicted=0, written=40916, synced=0, cpBufUsed=3,
> cpBufTotal=518215)]}}
> Here, it can be seen that, although there are plenty of non-dirty pages,
> throttling is applied. This happens because our speed-based throttling has 2
> algorithms for protecting non-dirty pages from exhaustion:
> # A more complex one that computes max allowable dirty ratio and ideal
> marking speed and throttles when both dirty ratio and current marking speed
> surpass these values
> # A simpler one that throttles if the current marking speed is higher than
> the average checkpointing speed
> In the shown example the first algorithm does not throttle, but the second
> one does.
> It looks like the throttling is enabled too early.
> One way to solve this problem is to just disable the second algorithm as the
> first seems to be more adequate (but this needs careful consideration of all
> possible cases).
> Another way is to consider averaged marking speed instead of (or in addition
> to) the current marking speed when deciding whether to throttle or not.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)