On Tue, 08 Apr 2014 10:31:44 +0200 Josef Johansson wrote:

> On 08/04/14 10:04, Christian Balzer wrote:
> > Hello,
> >
> > On Tue, 08 Apr 2014 09:31:18 +0200 Josef Johansson wrote:
> >
> >> Hi all,
> >>
> >> I am currently benchmarking a standard setup with Intel DC S3700 disks
> >> as journals and Hitachi 4TB-disks as data-drives, all on LACP 10GbE
> >> network.
> >>
> > Unless that is the 400GB version of the DC3700, you're already limiting
> > yourself to 365MB/s throughput with the 200GB variant.
> > If sequential write speed is that important to you and you think you'll
> > ever get those 5 HDs to write at full speed with Ceph (unlikely).
> >  
> It's the 400GB version of the DC3700, and yes, I'm aware that I need a
> 1:3 ratio to max out these disks, as they write sequential data at about
> 150MB/s.
> But our thoughts are that it would cover the current demand with a 1:5
> ratio, but we could upgrade.
I'd reckon you'll do fine, as in run out of steam and IOPS before hitting
that limit.

> >> The size of my journals are 25GB each, and I have two journals per
> >> machine, with 5 OSDs per journal, with 5 machines in total. We
> >> currently use the tunables optimal and the version of ceph is the
> >> latest dumpling.
> >>
> >> Benchmarking writes with rbd show that there's no problem hitting
> >> upper levels on the 4TB-disks with sequential data, thus maxing out
> >> 10GbE. At this moment we see full utilization on the journals.
> >>
> >> But lowering the byte-size to 4k shows that the journals are utilized
> >> to about 20%, and the 4TB-disks 100%. (rados -p <pool> -b 4096 -t 256
> >> 100 write)
> >>
> > When you're saying utilization I assume you're talking about iostat or
> > atop output?
> Yes, the utilization is iostat.
> > That's not a bug, that's comparing apple to oranges.
> You mean comparing iostat-results with the ones from rados benchmark?
> > The rados bench default is 4MB, which not only happens to be the
> > default RBD objectsize but also to generate a nice amount of
> > bandwidth. 
> >
> > While at 4k writes your SDD is obviously bored, but actual OSD needs to
> > handle all those writes and becomes limited by the IOPS it can peform.
> Yes, it's quite bored and just shuffles data.
> Maybe I've been thinking about this the wrong way,
> but shouldn't the Journal buffer more until the Journal partition is full
> or when the flush_interval is met.
> 
Take a look at "journal queue max ops", which has a default of a mere 500,
so that's full after 2 seconds. ^o^

Cheers,

Christian

> Right now the rados benchmark gets about 1MB/s throughput. I really
> don't know what is expected though, but it seems quite slow.
> 
> sudo rados bench -p shared-1 -b 4096 300 write
>  Maintaining 16 concurrent writes of 4096 bytes for up to 300 seconds or
> 0 objects
>  Object prefix: benchmark_data_px1_1502
>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
> lat 0       0         0         0         0         0         -         0
>      1      16       203       187  0.730312  0.730469  0.030537
> 0.080467 2      16       397       381  0.744003  0.757812  0.141118
> 0.0811331 3      16       625       609  0.792841  0.890625  0.017979
> 0.0776631 4      16       889       873  0.852415   1.03125   0.10221
> 0.0725933 5      16      1122      1106  0.863941  0.910156  0.001871
> 0.0709095 6      16      1437      1421  0.924995   1.23047  0.035859
> 0.0665901
> 
> Thanks for helping me out,
> Josef
> > Regards,
> >
> > Christian
> 
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian Balzer        Network/Systems Engineer                
[email protected]           Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to