[jira] [Commented] (RATIS-1886) AppendLog sleep fixed time cause significant drop in write throughput

Wei-Chiu Chuang (Jira) Wed, 13 Sep 2023 11:41:28 -0700


    [ 
https://issues.apache.org/jira/browse/RATIS-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764826#comment-17764826
 ]


Wei-Chiu Chuang commented on RATIS-1886:
----------------------------------------

I'm seeing the same with Ozone hflush API latency.

Using the Ozone Freon tool to hflush 32 files at the same time, the median 
latency improves by ~3x after reducing raft.server.log.appender.wait-time.min 
to 1ms.
{noformat}
ozone freon dfsg --buffer=1024 --copy-buffer=1024 -n 1000 --path 
ofs://ozone1/vol1/bucket1/dfsg -s 10240 -t 32 -sync hflush {noformat}
Before:
{noformat}
file-create
             count = 1000
         mean rate = 7.78 calls/second
     1-minute rate = 7.11 calls/second
     5-minute rate = 2.68 calls/second
    15-minute rate = 1.01 calls/second
               min = 325.44 milliseconds
               max = 7368.44 milliseconds
              mean = 3998.90 milliseconds
            stddev = 2182.79 milliseconds
            median = 4303.67 milliseconds
              75% <= 6207.01 milliseconds
              95% <= 6848.42 milliseconds
              98% <= 7191.24 milliseconds
              99% <= 7288.67 milliseconds
            99.9% <= 7356.27 milliseconds {noformat}
After:
{noformat}
file-create
             count = 1000
         mean rate = 24.42 calls/second
     1-minute rate = 22.75 calls/second
     5-minute rate = 21.44 calls/second
    15-minute rate = 21.15 calls/second
               min = 157.16 milliseconds
               max = 2459.12 milliseconds
              mean = 1282.02 milliseconds
            stddev = 850.42 milliseconds
            median = 1245.10 milliseconds
              75% <= 2169.65 milliseconds
              95% <= 2296.89 milliseconds
              98% <= 2361.82 milliseconds
              99% <= 2404.80 milliseconds
            99.9% <= 2459.12 milliseconds {noformat}

> AppendLog sleep fixed time cause significant drop in write throughput
> ---------------------------------------------------------------------
>
>                 Key: RATIS-1886
>                 URL: https://issues.apache.org/jira/browse/RATIS-1886
>             Project: Ratis
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 2.5.1
>            Reporter: Yaolong Liu
>            Priority: Major
>         Attachments: image-2023-09-13-15-44-00-933.png
>
>
> In https://issues.apache.org/jira/browse/RATIS-1793 , we enforce 
> raft.server.log.appender.wait-time.min, which make GrpcLogAppender sleep 
> fixed time during appendLog. This make alluxio master write throughput drop 
> 50% and unacceptable. The ops of alluxio master could see below
>  !image-2023-09-13-15-44-00-933.png! 
> I noticed that this patch was introduced to avoid leader being too busy in 
> some error conditions. Could we introduce sleep waiting when an error is 
> discovered (maybe not easy) or find a way to locate the error condition and 
> repair it completely? The performance degradation caused by sleeping for each 
> appendLog request may be underestimated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (RATIS-1886) AppendLog sleep fixed time cause significant drop in write throughput

Reply via email to