On Tue, 08 Apr 2014 10:31:44 +0200 Josef Johansson wrote: > On 08/04/14 10:04, Christian Balzer wrote: > > Hello, > > > > On Tue, 08 Apr 2014 09:31:18 +0200 Josef Johansson wrote: > > > >> Hi all, > >> > >> I am currently benchmarking a standard setup with Intel DC S3700 disks > >> as journals and Hitachi 4TB-disks as data-drives, all on LACP 10GbE > >> network. > >> > > Unless that is the 400GB version of the DC3700, you're already limiting > > yourself to 365MB/s throughput with the 200GB variant. > > If sequential write speed is that important to you and you think you'll > > ever get those 5 HDs to write at full speed with Ceph (unlikely). > > > It's the 400GB version of the DC3700, and yes, I'm aware that I need a > 1:3 ratio to max out these disks, as they write sequential data at about > 150MB/s. > But our thoughts are that it would cover the current demand with a 1:5 > ratio, but we could upgrade. I'd reckon you'll do fine, as in run out of steam and IOPS before hitting that limit.
> >> The size of my journals are 25GB each, and I have two journals per > >> machine, with 5 OSDs per journal, with 5 machines in total. We > >> currently use the tunables optimal and the version of ceph is the > >> latest dumpling. > >> > >> Benchmarking writes with rbd show that there's no problem hitting > >> upper levels on the 4TB-disks with sequential data, thus maxing out > >> 10GbE. At this moment we see full utilization on the journals. > >> > >> But lowering the byte-size to 4k shows that the journals are utilized > >> to about 20%, and the 4TB-disks 100%. (rados -p <pool> -b 4096 -t 256 > >> 100 write) > >> > > When you're saying utilization I assume you're talking about iostat or > > atop output? > Yes, the utilization is iostat. > > That's not a bug, that's comparing apple to oranges. > You mean comparing iostat-results with the ones from rados benchmark? > > The rados bench default is 4MB, which not only happens to be the > > default RBD objectsize but also to generate a nice amount of > > bandwidth. > > > > While at 4k writes your SDD is obviously bored, but actual OSD needs to > > handle all those writes and becomes limited by the IOPS it can peform. > Yes, it's quite bored and just shuffles data. > Maybe I've been thinking about this the wrong way, > but shouldn't the Journal buffer more until the Journal partition is full > or when the flush_interval is met. > Take a look at "journal queue max ops", which has a default of a mere 500, so that's full after 2 seconds. ^o^ Cheers, Christian > Right now the rados benchmark gets about 1MB/s throughput. I really > don't know what is expected though, but it seems quite slow. > > sudo rados bench -p shared-1 -b 4096 300 write > Maintaining 16 concurrent writes of 4096 bytes for up to 300 seconds or > 0 objects > Object prefix: benchmark_data_px1_1502 > sec Cur ops started finished avg MB/s cur MB/s last lat avg > lat 0 0 0 0 0 0 - 0 > 1 16 203 187 0.730312 0.730469 0.030537 > 0.080467 2 16 397 381 0.744003 0.757812 0.141118 > 0.0811331 3 16 625 609 0.792841 0.890625 0.017979 > 0.0776631 4 16 889 873 0.852415 1.03125 0.10221 > 0.0725933 5 16 1122 1106 0.863941 0.910156 0.001871 > 0.0709095 6 16 1437 1421 0.924995 1.23047 0.035859 > 0.0665901 > > Thanks for helping me out, > Josef > > Regards, > > > > Christian > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer [email protected] Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
