Some suggestions:

monitor raw resources such as cpu %util raw disk %util/busy, raw disk iops.

instead of running a mix of workloads at this stage, narrow it down first, for example using rbd rand writes and 4k block sizes, then change 1 param at a time for example change the block size. See how your cluster performs and what resources loads you get step by step. Latency from 4M will not be the same as 4k.

i would also run fio tests on the raw Nytro 1551 devices including sync writes.

I would not recommend you increase readahead for random io.

I do not recommend making RAID0

/Maged


On 01/10/2019 02:12, Sasha Litvak wrote:
At this point, I ran out of ideas.  I changed nr_requests and readahead parameters to 128->1024 and 128->4096, tuned nodes to performance-throughput.  However, I still get high latency during benchmark testing.  I attempted to disable cache on ssd

for i in {a..f}; do hdparm -W 0 -A 0 /dev/sd$i; done

and I think it make things not better at all.  I have H740 and H730 controllers with drives in HBA mode.

Other them converting them one by one to RAID0 I am not sure what else I can try.

Any suggestions?


On Mon, Sep 30, 2019 at 2:45 PM Paul Emmerich <paul.emmer...@croit.io <mailto:paul.emmer...@croit.io>> wrote:

    BTW: commit and apply latency are the exact same thing since
    BlueStore, so don't bother looking at both.

    In fact you should mostly be looking at the op_*_latency counters


    Paul

-- Paul Emmerich

    Looking for help with your Ceph cluster? Contact us at
    https://croit.io

    croit GmbH
    Freseniusstr. 31h
    81247 München
    www.croit.io <http://www.croit.io>
    Tel: +49 89 1896585 90

    On Mon, Sep 30, 2019 at 8:46 PM Sasha Litvak
    <alexander.v.lit...@gmail.com
    <mailto:alexander.v.lit...@gmail.com>> wrote:
    >
    > In my case, I am using premade Prometheus sourced dashboards in
    grafana.
    >
    > For individual latency, the query looks like that
    >
    > irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
    (ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])
    > irate(ceph_osd_op_w_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
    (ceph_daemon) irate(ceph_osd_op_w_latency_count[1m])
    >
    > The other ones use
    >
    > ceph_osd_commit_latency_ms
    > ceph_osd_apply_latency_ms
    >
    > and graph the distribution of it over time
    >
    > Also, average OSD op latency
    >
    > avg(rate(ceph_osd_op_r_latency_sum{cluster="$cluster"}[5m]) /
    rate(ceph_osd_op_r_latency_count{cluster="$cluster"}[5m]) >= 0)
    > avg(rate(ceph_osd_op_w_latency_sum{cluster="$cluster"}[5m]) /
    rate(ceph_osd_op_w_latency_count{cluster="$cluster"}[5m]) >= 0)
    >
    > Average OSD apply + commit latency
    > avg(ceph_osd_apply_latency_ms{cluster="$cluster"})
    > avg(ceph_osd_commit_latency_ms{cluster="$cluster"})
    >
    >
    > On Mon, Sep 30, 2019 at 11:13 AM Marc Roos
    <m.r...@f1-outsourcing.eu <mailto:m.r...@f1-outsourcing.eu>> wrote:
    >>
    >>
    >> What parameters are you exactly using? I want to do a similar
    test on
    >> luminous, before I upgrade to Nautilus. I have quite a lot (74+)
    >>
    >> type_instance=Osd.opBeforeDequeueOpLat
    >> type_instance=Osd.opBeforeQueueOpLat
    >> type_instance=Osd.opLatency
    >> type_instance=Osd.opPrepareLatency
    >> type_instance=Osd.opProcessLatency
    >> type_instance=Osd.opRLatency
    >> type_instance=Osd.opRPrepareLatency
    >> type_instance=Osd.opRProcessLatency
    >> type_instance=Osd.opRwLatency
    >> type_instance=Osd.opRwPrepareLatency
    >> type_instance=Osd.opRwProcessLatency
    >> type_instance=Osd.opWLatency
    >> type_instance=Osd.opWPrepareLatency
    >> type_instance=Osd.opWProcessLatency
    >> type_instance=Osd.subopLatency
    >> type_instance=Osd.subopWLatency
    >> ...
    >> ...
    >>
    >>
    >>
    >>
    >>
    >> -----Original Message-----
    >> From: Alex Litvak [mailto:alexander.v.lit...@gmail.com
    <mailto:alexander.v.lit...@gmail.com>]
    >> Sent: zondag 29 september 2019 13:06
    >> To: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    >> Cc: ceph-de...@vger.kernel.org <mailto:ceph-de...@vger.kernel.org>
    >> Subject: [ceph-users] Commit and Apply latency on nautilus
    >>
    >> Hello everyone,
    >>
    >> I am running a number of parallel benchmark tests against the
    cluster
    >> that should be ready to go to production.
    >> I enabled prometheus to monitor various information and while
    cluster
    >> stays healthy through the tests with no errors or slow requests,
    >> I noticed an apply / commit latency jumping between 40 - 600 ms on
    >> multiple SSDs.  At the same time op_read and op_write are on
    average
    >> below 0.25 ms in the worth case scenario.
    >>
    >> I am running nautilus 14.2.2, all bluestore, no separate NVME
    devices
    >> for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all
    drives Seagate
    >> Nytro 1551, osd spread across 6 nodes, running in
    >> containers.  Each node has plenty of RAM with utilization ~ 25
    GB during
    >> the benchmark runs.
    >>
    >> Here are benchmarks being run from 6 client systems in parallel,
    >> repeating the test for each block size in <4k,16k,128k,4M>.
    >>
    >> On rbd mapped partition local to each client:
    >>
    >> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
    >> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8
    --runtime=300
    >> --group_reporting --time_based --rwmixread=70
    >>
    >> On mounted cephfs volume with each client storing test file(s)
    in own
    >> sub-directory:
    >>
    >> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
    >> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8
    --runtime=300
    >> --group_reporting --time_based --rwmixread=70
    >>
    >> dbench -t 30 30
    >>
    >> Could you please let me know if huge jump in applied and committed
    >> latency is justified in my case and whether I can do anything
    to improve
    >> / fix it.  Below is some additional cluster info.
    >>
    >> Thank you,
    >>
    >> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la
    ceph osd df
    >> ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA OMAP    META   
     AVAIL
    >>   %USE VAR  PGS STATUS
    >>   6   ssd 1.74609  1.00000 1.7 TiB  93 GiB  92 GiB 240 MiB  784
    MiB 1.7
    >> TiB 5.21 0.90  44     up
    >> 12   ssd 1.74609  1.00000 1.7 TiB  98 GiB  97 GiB 118 MiB  906
    MiB 1.7
    >> TiB 5.47 0.95  40     up
    >> 18   ssd 1.74609  1.00000 1.7 TiB 102 GiB 101 GiB 123 MiB  901
    MiB 1.6
    >> TiB 5.73 0.99  47     up
    >> 24   ssd 3.49219  1.00000 3.5 TiB 222 GiB 221 GiB 134 MiB  890
    MiB 3.3
    >> TiB 6.20 1.07  96     up
    >> 30   ssd 3.49219  1.00000 3.5 TiB 213 GiB 212 GiB 151 MiB  873
    MiB 3.3
    >> TiB 5.95 1.03  93     up
    >> 35   ssd 3.49219  1.00000 3.5 TiB 203 GiB 202 GiB 301 MiB  723
    MiB 3.3
    >> TiB 5.67 0.98 100     up
    >>   5   ssd 1.74609  1.00000 1.7 TiB 103 GiB 102 GiB 123 MiB  901
    MiB 1.6
    >> TiB 5.78 1.00  49     up
    >> 11   ssd 1.74609  1.00000 1.7 TiB 109 GiB 108 GiB  63 MiB  961
    MiB 1.6
    >> TiB 6.09 1.05  46     up
    >> 17   ssd 1.74609  1.00000 1.7 TiB 104 GiB 103 GiB 205 MiB  819
    MiB 1.6
    >> TiB 5.81 1.01  50     up
    >> 23   ssd 3.49219  1.00000 3.5 TiB 210 GiB 209 GiB 168 MiB  856
    MiB 3.3
    >> TiB 5.86 1.01  86     up
    >> 29   ssd 3.49219  1.00000 3.5 TiB 204 GiB 203 GiB 272 MiB  752
    MiB 3.3
    >> TiB 5.69 0.98  92     up
    >> 34   ssd 3.49219  1.00000 3.5 TiB 198 GiB 197 GiB 295 MiB  729
    MiB 3.3
    >> TiB 5.54 0.96  85     up
    >>   4   ssd 1.74609  1.00000 1.7 TiB 119 GiB 118 GiB 16 KiB 1024
    MiB 1.6
    >> TiB 6.67 1.15  50     up
    >> 10   ssd 1.74609  1.00000 1.7 TiB  95 GiB  94 GiB 183 MiB  841
    MiB 1.7
    >> TiB 5.31 0.92  46     up
    >> 16   ssd 1.74609  1.00000 1.7 TiB 102 GiB 101 GiB 122 MiB  902
    MiB 1.6
    >> TiB 5.72 0.99  50     up
    >> 22   ssd 3.49219  1.00000 3.5 TiB 218 GiB 217 GiB 109 MiB  915
    MiB 3.3
    >> TiB 6.11 1.06  91     up
    >> 28   ssd 3.49219  1.00000 3.5 TiB 198 GiB 197 GiB 343 MiB  681
    MiB 3.3
    >> TiB 5.54 0.96  95     up
    >> 33   ssd 3.49219  1.00000 3.5 TiB 198 GiB 196 GiB 297 MiB 1019
    MiB 3.3
    >> TiB 5.53 0.96  85     up
    >>   1   ssd 1.74609  1.00000 1.7 TiB 101 GiB 100 GiB 222 MiB  802
    MiB 1.6
    >> TiB 5.63 0.97  49     up
    >>   7   ssd 1.74609  1.00000 1.7 TiB 102 GiB 101 GiB 153 MiB  871
    MiB 1.6
    >> TiB 5.69 0.99  46     up
    >> 13   ssd 1.74609  1.00000 1.7 TiB 106 GiB 105 GiB  67 MiB  957
    MiB 1.6
    >> TiB 5.96 1.03  42     up
    >> 19   ssd 3.49219  1.00000 3.5 TiB 206 GiB 205 GiB 179 MiB  845
    MiB 3.3
    >> TiB 5.77 1.00  83     up
    >> 25   ssd 3.49219  1.00000 3.5 TiB 195 GiB 194 GiB 352 MiB  672
    MiB 3.3
    >> TiB 5.45 0.94  97     up
    >> 31   ssd 3.49219  1.00000 3.5 TiB 201 GiB 200 GiB 305 MiB  719
    MiB 3.3
    >> TiB 5.62 0.97  90     up
    >>   0   ssd 1.74609  1.00000 1.7 TiB 110 GiB 109 GiB 29 MiB  995
    MiB 1.6
    >> TiB 6.14 1.06  43     up
    >>   3   ssd 1.74609  1.00000 1.7 TiB 109 GiB 108 GiB 28 MiB  996
    MiB 1.6
    >> TiB 6.07 1.05  41     up
    >>   9   ssd 1.74609  1.00000 1.7 TiB 103 GiB 102 GiB 149 MiB  875
    MiB 1.6
    >> TiB 5.76 1.00  52     up
    >> 15   ssd 3.49219  1.00000 3.5 TiB 209 GiB 208 GiB 253 MiB  771
    MiB 3.3
    >> TiB 5.83 1.01  98     up
    >> 21   ssd 3.49219  1.00000 3.5 TiB 199 GiB 198 GiB 302 MiB  722
    MiB 3.3
    >> TiB 5.56 0.96  90     up
    >> 27   ssd 3.49219  1.00000 3.5 TiB 208 GiB 207 GiB 226 MiB  798
    MiB 3.3
    >> TiB 5.81 1.00  95     up
    >>   2   ssd 1.74609  1.00000 1.7 TiB  96 GiB  95 GiB 158 MiB  866
    MiB 1.7
    >> TiB 5.35 0.93  45     up
    >>   8   ssd 1.74609  1.00000 1.7 TiB 106 GiB 105 GiB 132 MiB  892
    MiB 1.6
    >> TiB 5.91 1.02  50     up
    >> 14   ssd 1.74609  1.00000 1.7 TiB  96 GiB  95 GiB 180 MiB  844
    MiB 1.7
    >> TiB 5.35 0.92  46     up
    >> 20   ssd 3.49219  1.00000 3.5 TiB 221 GiB 220 GiB 156 MiB  868
    MiB 3.3
    >> TiB 6.18 1.07 101     up
    >> 26   ssd 3.49219  1.00000 3.5 TiB 206 GiB 205 GiB 332 MiB  692
    MiB 3.3
    >> TiB 5.76 1.00  92     up
    >> 32   ssd 3.49219  1.00000 3.5 TiB 221 GiB 220 GiB  88 MiB  936
    MiB 3.3
    >> TiB 6.18 1.07  91     up
    >>                      TOTAL  94 TiB 5.5 TiB 5.4 TiB 6.4 GiB   30
    GiB  89
    >> TiB 5.78
    >> MIN/MAX VAR: 0.90/1.15  STDDEV: 0.30
    >>
    >>
    >> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la
    ceph -s
    >>    cluster:
    >>      id:     9b4468b7-5bf2-4964-8aec-4b2f4bee87ad
    >>      health: HEALTH_OK
    >>
    >>    services:
    >>      mon: 3 daemons, quorum
    storage2n1-la,storage2n2-la,storage2n3-la
    >> (age 9w)
    >>      mgr: storage2n2-la(active, since 9w), standbys: storage2n1-la,
    >> storage2n3-la
    >>      mds: cephfs:1 {0=storage2n6-la=up:active} 1
    up:standby-replay 1
    >> up:standby
    >>      osd: 36 osds: 36 up (since 9w), 36 in (since 9w)
    >>
    >>    data:
    >>      pools:   3 pools, 832 pgs
    >>      objects: 4.18M objects, 1.8 TiB
    >>      usage:   5.5 TiB used, 89 TiB / 94 TiB avail
    >>      pgs:     832 active+clean
    >>
    >>    io:
    >>      client:   852 B/s rd, 15 KiB/s wr, 4 op/s rd, 2 op/s wr
    >>
    >>
    >>
    >>
    >>
    >> _______________________________________________
    >> ceph-users mailing list
    >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    >>
    >>
    > _______________________________________________
    > ceph-users mailing list
    > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to