Re: [ceph-users] Commit and Apply latency on nautilus

Maged Mokhtar Tue, 01 Oct 2019 12:56:37 -0700

Some suggestions:

monitor raw resources such as cpu %util raw disk %util/busy, raw disk iops.

instead of running a mix of workloads at this stage, narrow it downfirst, for example using rbd rand writes and 4k block sizes, then change1 param at a time for example change the block size. See how yourcluster performs and what resources loads you get step by step. Latencyfrom 4M will not be the same as 4k.

i would also run fio tests on the raw Nytro 1551 devices including syncwrites.


I would not recommend you increase readahead for random io.

I do not recommend making RAID0

/Maged


On 01/10/2019 02:12, Sasha Litvak wrote:

At this point, I ran out of ideas. I changed nr_requests andreadahead parameters to 128->1024 and 128->4096, tuned nodes toperformance-throughput. However, I still get high latency duringbenchmark testing. I attempted to disable cache on ssd


for i in {a..f}; do hdparm -W 0 -A 0 /dev/sd$i; done

and I think it make things not better at all. I have H740 and H730controllers with drives in HBA mode.

Other them converting them one by one to RAID0 I am not sure what elseI can try.


Any suggestions?

On Mon, Sep 30, 2019 at 2:45 PM Paul Emmerich <paul.emmer...@croit.io<mailto:paul.emmer...@croit.io>> wrote:


    BTW: commit and apply latency are the exact same thing since
    BlueStore, so don't bother looking at both.

    In fact you should mostly be looking at the op_*_latency counters


    Paul

--Paul Emmerich


    Looking for help with your Ceph cluster? Contact us at
    https://croit.io

    croit GmbH
    Freseniusstr. 31h
    81247 München
    www.croit.io <http://www.croit.io>
    Tel: +49 89 1896585 90

    On Mon, Sep 30, 2019 at 8:46 PM Sasha Litvak
    <alexander.v.lit...@gmail.com
    <mailto:alexander.v.lit...@gmail.com>> wrote:
    >
    > In my case, I am using premade Prometheus sourced dashboards in
    grafana.
    >
    > For individual latency, the query looks like that
    >
    > irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
    (ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])
    > irate(ceph_osd_op_w_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
    (ceph_daemon) irate(ceph_osd_op_w_latency_count[1m])
    >
    > The other ones use
    >
    > ceph_osd_commit_latency_ms
    > ceph_osd_apply_latency_ms
    >
    > and graph the distribution of it over time
    >
    > Also, average OSD op latency
    >
    > avg(rate(ceph_osd_op_r_latency_sum{cluster="$cluster"}[5m]) /
    rate(ceph_osd_op_r_latency_count{cluster="$cluster"}[5m]) >= 0)
    > avg(rate(ceph_osd_op_w_latency_sum{cluster="$cluster"}[5m]) /
    rate(ceph_osd_op_w_latency_count{cluster="$cluster"}[5m]) >= 0)
    >
    > Average OSD apply + commit latency
    > avg(ceph_osd_apply_latency_ms{cluster="$cluster"})
    > avg(ceph_osd_commit_latency_ms{cluster="$cluster"})
    >
    >
    > On Mon, Sep 30, 2019 at 11:13 AM Marc Roos
    <m.r...@f1-outsourcing.eu <mailto:m.r...@f1-outsourcing.eu>> wrote:
    >>
    >>
    >> What parameters are you exactly using? I want to do a similar
    test on
    >> luminous, before I upgrade to Nautilus. I have quite a lot (74+)
    >>
    >> type_instance=Osd.opBeforeDequeueOpLat
    >> type_instance=Osd.opBeforeQueueOpLat
    >> type_instance=Osd.opLatency
    >> type_instance=Osd.opPrepareLatency
    >> type_instance=Osd.opProcessLatency
    >> type_instance=Osd.opRLatency
    >> type_instance=Osd.opRPrepareLatency
    >> type_instance=Osd.opRProcessLatency
    >> type_instance=Osd.opRwLatency
    >> type_instance=Osd.opRwPrepareLatency
    >> type_instance=Osd.opRwProcessLatency
    >> type_instance=Osd.opWLatency
    >> type_instance=Osd.opWPrepareLatency
    >> type_instance=Osd.opWProcessLatency
    >> type_instance=Osd.subopLatency
    >> type_instance=Osd.subopWLatency
    >> ...
    >> ...
    >>
    >>
    >>
    >>
    >>
    >> -----Original Message-----
    >> From: Alex Litvak [mailto:alexander.v.lit...@gmail.com
    <mailto:alexander.v.lit...@gmail.com>]
    >> Sent: zondag 29 september 2019 13:06
    >> To: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    >> Cc: ceph-de...@vger.kernel.org <mailto:ceph-de...@vger.kernel.org>
    >> Subject: [ceph-users] Commit and Apply latency on nautilus
    >>
    >> Hello everyone,
    >>
    >> I am running a number of parallel benchmark tests against the
    cluster
    >> that should be ready to go to production.
    >> I enabled prometheus to monitor various information and while
    cluster
    >> stays healthy through the tests with no errors or slow requests,
    >> I noticed an apply / commit latency jumping between 40 - 600 ms on
    >> multiple SSDs.  At the same time op_read and op_write are on
    average
    >> below 0.25 ms in the worth case scenario.
    >>
    >> I am running nautilus 14.2.2, all bluestore, no separate NVME
    devices
    >> for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all
    drives Seagate
    >> Nytro 1551, osd spread across 6 nodes, running in
    >> containers.  Each node has plenty of RAM with utilization ~ 25
    GB during
    >> the benchmark runs.
    >>
    >> Here are benchmarks being run from 6 client systems in parallel,
    >> repeating the test for each block size in <4k,16k,128k,4M>.
    >>
    >> On rbd mapped partition local to each client:
    >>
    >> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
    >> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8
    --runtime=300
    >> --group_reporting --time_based --rwmixread=70
    >>
    >> On mounted cephfs volume with each client storing test file(s)
    in own
    >> sub-directory:
    >>
    >> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
    >> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8
    --runtime=300
    >> --group_reporting --time_based --rwmixread=70
    >>
    >> dbench -t 30 30
    >>
    >> Could you please let me know if huge jump in applied and committed
    >> latency is justified in my case and whether I can do anything
    to improve
    >> / fix it.  Below is some additional cluster info.
    >>
    >> Thank you,
    >>
    >> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la
    ceph osd df
    >> ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA OMAP    META   
     AVAIL
    >>   %USE VAR  PGS STATUS
    >>   6   ssd 1.74609  1.00000 1.7 TiB  93 GiB  92 GiB 240 MiB  784
    MiB 1.7
    >> TiB 5.21 0.90  44     up
    >> 12   ssd 1.74609  1.00000 1.7 TiB  98 GiB  97 GiB 118 MiB  906
    MiB 1.7
    >> TiB 5.47 0.95  40     up
    >> 18   ssd 1.74609  1.00000 1.7 TiB 102 GiB 101 GiB 123 MiB  901
    MiB 1.6
    >> TiB 5.73 0.99  47     up
    >> 24   ssd 3.49219  1.00000 3.5 TiB 222 GiB 221 GiB 134 MiB  890
    MiB 3.3
    >> TiB 6.20 1.07  96     up
    >> 30   ssd 3.49219  1.00000 3.5 TiB 213 GiB 212 GiB 151 MiB  873
    MiB 3.3
    >> TiB 5.95 1.03  93     up
    >> 35   ssd 3.49219  1.00000 3.5 TiB 203 GiB 202 GiB 301 MiB  723
    MiB 3.3
    >> TiB 5.67 0.98 100     up
    >>   5   ssd 1.74609  1.00000 1.7 TiB 103 GiB 102 GiB 123 MiB  901
    MiB 1.6
    >> TiB 5.78 1.00  49     up
    >> 11   ssd 1.74609  1.00000 1.7 TiB 109 GiB 108 GiB  63 MiB  961
    MiB 1.6
    >> TiB 6.09 1.05  46     up
    >> 17   ssd 1.74609  1.00000 1.7 TiB 104 GiB 103 GiB 205 MiB  819
    MiB 1.6
    >> TiB 5.81 1.01  50     up
    >> 23   ssd 3.49219  1.00000 3.5 TiB 210 GiB 209 GiB 168 MiB  856
    MiB 3.3
    >> TiB 5.86 1.01  86     up
    >> 29   ssd 3.49219  1.00000 3.5 TiB 204 GiB 203 GiB 272 MiB  752
    MiB 3.3
    >> TiB 5.69 0.98  92     up
    >> 34   ssd 3.49219  1.00000 3.5 TiB 198 GiB 197 GiB 295 MiB  729
    MiB 3.3
    >> TiB 5.54 0.96  85     up
    >>   4   ssd 1.74609  1.00000 1.7 TiB 119 GiB 118 GiB 16 KiB 1024
    MiB 1.6
    >> TiB 6.67 1.15  50     up
    >> 10   ssd 1.74609  1.00000 1.7 TiB  95 GiB  94 GiB 183 MiB  841
    MiB 1.7
    >> TiB 5.31 0.92  46     up
    >> 16   ssd 1.74609  1.00000 1.7 TiB 102 GiB 101 GiB 122 MiB  902
    MiB 1.6
    >> TiB 5.72 0.99  50     up
    >> 22   ssd 3.49219  1.00000 3.5 TiB 218 GiB 217 GiB 109 MiB  915
    MiB 3.3
    >> TiB 6.11 1.06  91     up
    >> 28   ssd 3.49219  1.00000 3.5 TiB 198 GiB 197 GiB 343 MiB  681
    MiB 3.3
    >> TiB 5.54 0.96  95     up
    >> 33   ssd 3.49219  1.00000 3.5 TiB 198 GiB 196 GiB 297 MiB 1019
    MiB 3.3
    >> TiB 5.53 0.96  85     up
    >>   1   ssd 1.74609  1.00000 1.7 TiB 101 GiB 100 GiB 222 MiB  802
    MiB 1.6
    >> TiB 5.63 0.97  49     up
    >>   7   ssd 1.74609  1.00000 1.7 TiB 102 GiB 101 GiB 153 MiB  871
    MiB 1.6
    >> TiB 5.69 0.99  46     up
    >> 13   ssd 1.74609  1.00000 1.7 TiB 106 GiB 105 GiB  67 MiB  957
    MiB 1.6
    >> TiB 5.96 1.03  42     up
    >> 19   ssd 3.49219  1.00000 3.5 TiB 206 GiB 205 GiB 179 MiB  845
    MiB 3.3
    >> TiB 5.77 1.00  83     up
    >> 25   ssd 3.49219  1.00000 3.5 TiB 195 GiB 194 GiB 352 MiB  672
    MiB 3.3
    >> TiB 5.45 0.94  97     up
    >> 31   ssd 3.49219  1.00000 3.5 TiB 201 GiB 200 GiB 305 MiB  719
    MiB 3.3
    >> TiB 5.62 0.97  90     up
    >>   0   ssd 1.74609  1.00000 1.7 TiB 110 GiB 109 GiB 29 MiB  995
    MiB 1.6
    >> TiB 6.14 1.06  43     up
    >>   3   ssd 1.74609  1.00000 1.7 TiB 109 GiB 108 GiB 28 MiB  996
    MiB 1.6
    >> TiB 6.07 1.05  41     up
    >>   9   ssd 1.74609  1.00000 1.7 TiB 103 GiB 102 GiB 149 MiB  875
    MiB 1.6
    >> TiB 5.76 1.00  52     up
    >> 15   ssd 3.49219  1.00000 3.5 TiB 209 GiB 208 GiB 253 MiB  771
    MiB 3.3
    >> TiB 5.83 1.01  98     up
    >> 21   ssd 3.49219  1.00000 3.5 TiB 199 GiB 198 GiB 302 MiB  722
    MiB 3.3
    >> TiB 5.56 0.96  90     up
    >> 27   ssd 3.49219  1.00000 3.5 TiB 208 GiB 207 GiB 226 MiB  798
    MiB 3.3
    >> TiB 5.81 1.00  95     up
    >>   2   ssd 1.74609  1.00000 1.7 TiB  96 GiB  95 GiB 158 MiB  866
    MiB 1.7
    >> TiB 5.35 0.93  45     up
    >>   8   ssd 1.74609  1.00000 1.7 TiB 106 GiB 105 GiB 132 MiB  892
    MiB 1.6
    >> TiB 5.91 1.02  50     up
    >> 14   ssd 1.74609  1.00000 1.7 TiB  96 GiB  95 GiB 180 MiB  844
    MiB 1.7
    >> TiB 5.35 0.92  46     up
    >> 20   ssd 3.49219  1.00000 3.5 TiB 221 GiB 220 GiB 156 MiB  868
    MiB 3.3
    >> TiB 6.18 1.07 101     up
    >> 26   ssd 3.49219  1.00000 3.5 TiB 206 GiB 205 GiB 332 MiB  692
    MiB 3.3
    >> TiB 5.76 1.00  92     up
    >> 32   ssd 3.49219  1.00000 3.5 TiB 221 GiB 220 GiB  88 MiB  936
    MiB 3.3
    >> TiB 6.18 1.07  91     up
    >>                      TOTAL  94 TiB 5.5 TiB 5.4 TiB 6.4 GiB   30
    GiB  89
    >> TiB 5.78
    >> MIN/MAX VAR: 0.90/1.15  STDDEV: 0.30
    >>
    >>
    >> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la
    ceph -s
    >>    cluster:
    >>      id:     9b4468b7-5bf2-4964-8aec-4b2f4bee87ad
    >>      health: HEALTH_OK
    >>
    >>    services:
    >>      mon: 3 daemons, quorum
    storage2n1-la,storage2n2-la,storage2n3-la
    >> (age 9w)
    >>      mgr: storage2n2-la(active, since 9w), standbys: storage2n1-la,
    >> storage2n3-la
    >>      mds: cephfs:1 {0=storage2n6-la=up:active} 1
    up:standby-replay 1
    >> up:standby
    >>      osd: 36 osds: 36 up (since 9w), 36 in (since 9w)
    >>
    >>    data:
    >>      pools:   3 pools, 832 pgs
    >>      objects: 4.18M objects, 1.8 TiB
    >>      usage:   5.5 TiB used, 89 TiB / 94 TiB avail
    >>      pgs:     832 active+clean
    >>
    >>    io:
    >>      client:   852 B/s rd, 15 KiB/s wr, 4 op/s rd, 2 op/s wr
    >>
    >>
    >>
    >>
    >>
    >> _______________________________________________
    >> ceph-users mailing list
    >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    >>
    >>
    > _______________________________________________
    > ceph-users mailing list
    > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Commit and Apply latency on nautilus

Reply via email to