> -----Original Message-----
> From: ceph-users [mailto:[email protected]] On Behalf Of
> Willem Jan Withagen
> Sent: 26 June 2017 14:35
> To: Christian Wuerdig <[email protected]>
> Cc: Ceph Users <[email protected]>
> Subject: Re: [ceph-users] Ceph random read IOPS
>
> On 26-6-2017 09:01, Christian Wuerdig wrote:
> > Well, preferring faster clock CPUs for SSD scenarios has been floated
> > several times over the last few months on this list. And realistic or
> > not, Nick's and Kostas' setup are similar enough (testing single disk)
> > that it's a distinct possibility.
> > Anyway, as mentioned measuring the performance counters would
> probably
> > provide more insight.
>
> I read the advise as:
> prefer GHz over cores.
>
> And especially since there is a sort of balance between either GHz or
cores,
> that can be an expensive one. Getting both means you have to pay
relatively
> substantial more money.
>
> And for an average Ceph server with plenty OSDs, I personally just don't
buy
> that. There you'd have to look at the total throughput of the the system,
and
> latency is only one of the many factors.
>
> Let alone in a cluster with several hosts (and or racks). There the
latency is
> dictated by the network. So a bad choice of network card or switch will
out
> do any extra cycles that your CPU can burn.
>
> I think that just testing 1 OSD is testing artifacts, and has very little
to do with
> running an actual ceph cluster.
>
> So if one would like to test this, the test setup should be something
> like: 3 hosts with something like 3 disks per host, min_disk=2 and a nice
> workload.
> Then turn the GHz-knob and see what happens with client latency and
> throughput.
Did similar tests last summer. 5 nodes with 12x 7.2k disks each, connected
via 10G. NVME journal. 3x replica pool.
First test was with C-states left to auto and frequency scaling leaving the
cores at lowest frequency of 900mhz. The cluster will quite happily do a
couple of thousand IO's without generating enough CPU load to boost the 4
cores up to max C-state or frequency.
With small background IO going on in background, a QD=1 sequential 4kb write
was done with the following results:
write: io=115268KB, bw=1670.1KB/s, iops=417, runt= 68986msec
slat (usec): min=2, max=414, avg= 4.41, stdev= 3.81
clat (usec): min=966, max=27116, avg=2386.84, stdev=571.57
lat (usec): min=970, max=27120, avg=2391.25, stdev=571.69
clat percentiles (usec):
| 1.00th=[ 1480], 5.00th=[ 1688], 10.00th=[ 1912], 20.00th=[ 2128],
| 30.00th=[ 2192], 40.00th=[ 2288], 50.00th=[ 2352], 60.00th=[ 2448],
| 70.00th=[ 2576], 80.00th=[ 2704], 90.00th=[ 2832], 95.00th=[ 2960],
| 99.00th=[ 3312], 99.50th=[ 3536], 99.90th=[ 6112], 99.95th=[ 9536],
| 99.99th=[22400]
So just under 2.5ms write latency.
I don't have the results from the separate C-states/frequency scaling, but
adjusting either got me a boost. Forcing to C1 and max frequency of 3.6Ghz
got me:
write: io=105900KB, bw=5715.7KB/s, iops=1428, runt= 18528msec
slat (usec): min=2, max=106, avg= 3.50, stdev= 1.31
clat (usec): min=491, max=32099, avg=694.16, stdev=491.91
lat (usec): min=494, max=32102, avg=697.66, stdev=492.04
clat percentiles (usec):
| 1.00th=[ 540], 5.00th=[ 572], 10.00th=[ 588], 20.00th=[ 604],
| 30.00th=[ 620], 40.00th=[ 636], 50.00th=[ 652], 60.00th=[ 668],
| 70.00th=[ 692], 80.00th=[ 716], 90.00th=[ 764], 95.00th=[ 820],
| 99.00th=[ 1448], 99.50th=[ 2320], 99.90th=[ 7584], 99.95th=[11712],
| 99.99th=[24448]
Quite a bit faster. Although these are best case figures, if any substantial
workload is run, the average tends to hover around 1ms latency.
Nick
>
> --WjW
>
> > On Sun, Jun 25, 2017 at 4:53 AM, Willem Jan Withagen <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >
> >
> > Op 24 jun. 2017 om 14:17 heeft Maged Mokhtar
> <[email protected]
> > <mailto:[email protected]>> het volgende geschreven:
> >
> >> My understanding was this test is targeting latency more than
> >> IOPS. This is probably why its was run using QD=1. It also makes
> >> sense that cpu freq will be more important than cores.
> >>
> >
> > But then it is not generic enough to be used as an advise!
> > It is just a line in 3D-space.
> > As there are so many
> >
> > --WjW
> >
> >> On 2017-06-24 12:52, Willem Jan Withagen wrote:
> >>
> >>> On 24-6-2017 05:30, Christian Wuerdig wrote:
> >>>> The general advice floating around is that your want CPUs with
high
> >>>> clock speeds rather than more cores to reduce latency and
> >>>> increase IOPS
> >>>> for SSD setups (see also
> >>>> http://www.sys-pro.co.uk/ceph-storage-fast-cpus-ssd-performance/
> >>>> <http://www.sys-pro.co.uk/ceph-storage-fast-cpus-ssd-
> performance/>)
> >>>> So
> >>>> something like a E5-2667V4 might bring better results in that
> >>>> situation.
> >>>> Also there was some talk about disabling the processor C states
> >>>> in order
> >>>> to bring latency down (something like this should be easy to
test:
> >>>> https://stackoverflow.com/a/22482722/220986
> >>>> <https://stackoverflow.com/a/22482722/220986>)
> >>>
> >>> I would be very careful to call this a general advice...
> >>>
> >>> Although the article is interesting, it is rather single sided.
> >>>
> >>> The only thing is shows that there is a lineair relation between
> >>> clockspeed and write or read speeds???
> >>> The article is rather vague on how and what is actually tested.
> >>>
> >>> By just running a single OSD with no replication a lot of the
> >>> functionality is left out of the equation.
> >>> Nobody is running just 1 osD on a box in a normal cluster host.
> >>>
> >>> Not using a serious SSD is another source of noise on the
conclusion.
> >>> More Queue depth can/will certainly have impact on concurrency.
> >>>
> >>> I would call this an observation, and nothing more.
> >>>
> >>> --WjW
> >>>>
> >>>> On Sat, Jun 24, 2017 at 1:28 AM, Kostas Paraskevopoulos
> >>>> <[email protected] <mailto:[email protected]>
> >>>> <mailto:[email protected] <mailto:[email protected]>>>
> >>>> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> We are in the process of evaluating the performance of a
testing
> >>>> cluster (3 nodes) with ceph jewel. Our setup consists of:
> >>>> 3 monitors (VMs)
> >>>> 2 physical servers each connected with 1 JBOD running Ubuntu
> >>>> Server
> >>>> 16.04
> >>>>
> >>>> Each server has 32 threads @2.1GHz and 128GB RAM.
> >>>> The disk distribution per server is:
> >>>> 38 * HUS726020ALS210 (SAS rotational)
> >>>> 2 * HUSMH8010BSS200 (SAS SSD for journals)
> >>>> 2 * ST1920FM0043 (SAS SSD for data)
> >>>> 1 * INTEL SSDPEDME012T4 (NVME measured with fio ~300K iops)
> >>>>
> >>>> Since we don't currently have a 10Gbit switch, we test the
> >>>> performance
> >>>> with the cluster in a degraded state, the noout flag set and
> >>>> we mount
> >>>> rbd images on the powered on osd node. We confirmed that the
> >>>> network
> >>>> is not saturated during the tests.
> >>>>
> >>>> We ran tests on the NVME disk and the pool created on this
> >>>> disk where
> >>>> we hoped to get the most performance without getting limited
> >>>> by the
> >>>> hardware specs since we have more disks than CPU threads.
> >>>>
> >>>> The nvme disk was at first partitioned with one partition
> >>>> and the
> >>>> journal on the same disk. The performance on random 4K reads
> was
> >>>> topped at 50K iops. We then removed the osd and partitioned
> >>>> with 4
> >>>> data partitions and 4 journals on the same disk. The
performance
> >>>> didn't increase significantly. Also, since we run read
> >>>> tests, the
> >>>> journals shouldn't cause performance issues.
> >>>>
> >>>> We then ran 4 fio processes in parallel on the same rbd
> >>>> mounted image
> >>>> and the total iops reached 100K. More parallel fio processes
> >>>> didn't
> >>>> increase the measured iops.
> >>>>
> >>>> Our ceph.conf is pretty basic (debug is set to 0/0 for
> >>>> everything) and
> >>>> the crushmap just defines the different buckets/rules for
> >>>> the disk
> >>>> separation (rotational, ssd, nvme) in order to create the
> >>>> required
> >>>> pools
> >>>>
> >>>> Is the performance of 100.000 iops for random 4K read normal
> >>>> for a
> >>>> disk that on the same benchmark runs at more than 300K iops
> >>>> on the
> >>>> same hardware or are we missing something?
> >>>>
> >>>> Best regards,
> >>>> Kostas
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> [email protected] <mailto:[email protected]>
> >>>> <mailto:[email protected]
> >>>> <mailto:[email protected]>>
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> [email protected] <mailto:[email protected]>
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >>>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> [email protected] <mailto:[email protected]>
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> >>
> >>
> >>
> >>
> >
> >
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com