[ 
https://issues.apache.org/jira/browse/IGNITE-16582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521073#comment-17521073
 ] 

Roman Puchkovskiy commented on IGNITE-16582:
--------------------------------------------

I've tested the *New* code (the simpler sub-throttler is removed, the more 
complex one is enabled) vs the *Old* code. There were 3 scenarios:
 * Fast (checkpointing happens at full speed, puts happen at full speed 
possible)
 * Slow (checkointing speed is limited with approximately 300 pages/sec, which 
is a lot slower than the pages are dirtied, puts happen at full speed possible)
 * Saw (checkpointing speed is limited as in Slow scenario, but puts speed is 
switch back and forth between full speed and zero speed, switching happens each 
10 seconds)

Results follow:
 # Old, Slow: 1890 put/sec, 120 sec/checkpoint
 # New, Slow: 1857 put/sec, 245 sec/cp
 # New, Slow (page write = page sync): 3412 put/sec, 133 sec/cp
 # Old, Fast: 11040 put/sec, 9 sec/cp
 # New, Fast: 16012 put/sec, 6 sec/cp
 # Old, Saw: 1769 put/sec, 105 sec/cp
 # New, Saw: 1814 put/sec, 243 sec/cp
 # Old Slow (1hr): 1310 put/sec, 54 sec/cp (4/63), max dirty: 73%
 # New Slow (1hr): 2366 put/sec, 89 sec/cp (22/18), max dirty: 53%

All the runs but 2 last ones took 30 minutes; 2 last runs (8 and 9) are 1 hour 
each.

It can be seen that average checkpoint duration has increased with the new 
code, but this is caused by the way the slowdown was produced: page writes were 
slowed down, but pages syncs to disk where not, but checkpoint scheduling is 
off. Result number 3 emulates equal write/sync durations, and it demonstrates 
that the checkpoint duration is back to normal; so the checkpoint duration is 
not a problem, it's just an artifact of the tests.

Other than that, the results demonstrate that the new code does not hurt 
throughput, in some cases it improves significantly.

> Improve behavior of speed-based throttling when dirty pages ratio is low
> ------------------------------------------------------------------------
>
>                 Key: IGNITE-16582
>                 URL: https://issues.apache.org/jira/browse/IGNITE-16582
>             Project: Ignite
>          Issue Type: Improvement
>          Components: persistence
>    Affects Versions: 2.12
>            Reporter: Roman Puchkovskiy
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>             Fix For: 2.14
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is a log:
> {{Throttling is applied to page modifications [}}
> percentOfPartTime=0.59, 
> markDirty=7424 pages/sec, 
> checkpointWrite=6268 pages/sec, 
> estIdealMarkDirty=0 pages/sec, 
> curDirty=0.00, 
> maxDirty=0.24, 
> avgParkTime=79770 ns, 
> {{pages: (total=67085, evicted=0, written=40916, synced=0, cpBufUsed=3, 
> cpBufTotal=518215)]}}
> Here, it can be seen that, although there are plenty of non-dirty pages, 
> throttling is applied. This happens because our speed-based throttling has 2 
> algorithms for protecting non-dirty pages from exhaustion:
>  # A more complex one that computes max allowable dirty ratio and ideal 
> marking speed and throttles when both dirty ratio and current marking speed 
> surpass these values
>  # A simpler one that throttles if the current marking speed is higher than 
> the average checkpointing speed
> In the shown example the first algorithm does not throttle, but the second 
> one does.
> It looks like the throttling is enabled too early.
> One way to solve this problem is to just disable the second algorithm as the 
> first seems to be more adequate (but this needs careful consideration of all 
> possible cases).
> Another way is to consider averaged marking speed instead of (or in addition 
> to) the current marking speed when deciding whether to throttle or not.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to