Not been able to make any headway on this after some significant effort. -Tested all 48 SSDs with FIO directly, all tested with 10% of each other for 4k iops in rand|seq read|write. -Disabled all CPU power save. -Tested with both rbd cache enabled and disabled on the client. -Tested with drive caches enabled and disabled (hdparm) -Minimal TCP retransmissions under load (<10 for a 2 minute duration). -No drops/pause frames noted on upstream switches. -CPU load on OSD nodes peaks at 6~. -iostat shows a peak of 15ms under read/write workloads, %util peaks at about 10%. -Swapped out the RBD client for a bigger box, since the load was peaking at 16. Now a 24 core box, load still peaks at 16. -Disabled cephx signatures -Verified hardware health (nothing in dmesg, nothing in CIMC fault logs, storage controller logs) -Test multiple SSDs at once to find the controllers iops limit, which is apparently 650k @ 4k.
Nothing has made a noticeable difference here. I'm pretty baffled as to what would be causing the awful sequential read and write performance, but allowing good random r/w speeds. I switched up fio testing methodologies to use more threads, but this didn't seem to help either: [global] bs=4k ioengine=rbd iodepth=32 size=5g runtime=120 numjobs=4 group_reporting=1 pool=rbd_af1 rbdname=image1 [seq-read] rw=read stonewall [rand-read] rw=randread stonewall [seq-write] rw=write stonewall [rand-write] rw=randwrite stonewall Any pointers are appreciated at this point. I've been following other threads on the mailing list, and looked at the archives, related to RBD performance but none of the solutions that worked for others seem to have helped this setup. Thanks, Anthony ________________________________ From: Anthony Brandelli (abrandel) <abran...@cisco.com> Sent: Tuesday, January 14, 2020 12:43 AM To: firstname.lastname@example.org <email@example.com> Subject: Slow Performance - Sequential IO I have a newly setup test cluster that is giving some surprising numbers when running fio against an RBD. The end goal here is to see how viable a Ceph based iSCSI SAN of sorts is for VMware clusters, which require a bunch of random IO. Hardware: 2x E5-2630L v2 (2.4GHz, 6 core) 256GB RAM 2x 10gbps bonded network, Intel X520 LSI 9271-8i, SSDs used for OSDs in JBOD mode Mons: 2x 1.2TB 10K SAS in RAID1 OSDs: 12x Samsung MZ6ER800HAGL-00003 800GB SAS SSDs, super cap/power loss protection Cluster setup: Three mon nodes, four OSD nodes Two OSDs per SSD Replica 3 pool Ceph 14.2.5 Ceph status: cluster: id: e3d93b4a-520c-4d82-a135-97d0bda3e69d health: HEALTH_WARN application not enabled on 1 pool(s) services: mon: 3 daemons, quorum mon1,mon2,mon3 (age 6d) mgr: mon2(active, since 6d), standbys: mon3, mon1 osd: 96 osds: 96 up (since 3d), 96 in (since 3d) data: pools: 1 pools, 3072 pgs objects: 857.00k objects, 1.8 TiB usage: 432 GiB used, 34 TiB / 35 TiB avail pgs: 3072 active+clean Network between nodes tests at 9.88gbps. Direct testing of the SSDs using a 4K block in fio shows 127k seq read, 86k randm read, 107k seq write, 52k random write. No high CPU load/interface saturation is noted when running tests against the rbd. When testing with a 4K block size against an RBD on a dedicated metal test host (same specs as other cluster nodes noted above) I get the following (command similar to fio -ioengine=rbd -direct=1 -name=test -bs=4k -iodepth=32 -rw=XXXX -pool=scbench -runtime=60 -rbdname=datatest): 10k sequential read iops 69k random read iops 13k sequential write iops 22k random write iops I’m not clear why the random ops, especially read, would be so much quicker compared to the sequential ops. Any points appreciated. Thanks, Anthony
_______________________________________________ ceph-users mailing list firstname.lastname@example.org http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com