Re: [ceph-users] How to detect journal problems

Josef Johansson Tue, 08 Apr 2014 05:19:56 -0700

On 08/04/14 10:39, Christian Balzer wrote:
> On Tue, 08 Apr 2014 10:31:44 +0200 Josef Johansson wrote:
>
>> On 08/04/14 10:04, Christian Balzer wrote:
>>> Hello,
>>>
>>> On Tue, 08 Apr 2014 09:31:18 +0200 Josef Johansson wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am currently benchmarking a standard setup with Intel DC S3700 disks
>>>> as journals and Hitachi 4TB-disks as data-drives, all on LACP 10GbE
>>>> network.
>>>>
>>> Unless that is the 400GB version of the DC3700, you're already limiting
>>> yourself to 365MB/s throughput with the 200GB variant.
>>> If sequential write speed is that important to you and you think you'll
>>> ever get those 5 HDs to write at full speed with Ceph (unlikely).
>>>  
>> It's the 400GB version of the DC3700, and yes, I'm aware that I need a
>> 1:3 ratio to max out these disks, as they write sequential data at about
>> 150MB/s.
>> But our thoughts are that it would cover the current demand with a 1:5
>> ratio, but we could upgrade.
> I'd reckon you'll do fine, as in run out of steam and IOPS before hitting
> that limit.
>
>>>> The size of my journals are 25GB each, and I have two journals per
>>>> machine, with 5 OSDs per journal, with 5 machines in total. We
>>>> currently use the tunables optimal and the version of ceph is the
>>>> latest dumpling.
>>>>
>>>> Benchmarking writes with rbd show that there's no problem hitting
>>>> upper levels on the 4TB-disks with sequential data, thus maxing out
>>>> 10GbE. At this moment we see full utilization on the journals.
>>>>
>>>> But lowering the byte-size to 4k shows that the journals are utilized
>>>> to about 20%, and the 4TB-disks 100%. (rados -p <pool> -b 4096 -t 256
>>>> 100 write)
>>>>
>>> When you're saying utilization I assume you're talking about iostat or
>>> atop output?
>> Yes, the utilization is iostat.
>>> That's not a bug, that's comparing apple to oranges.
>> You mean comparing iostat-results with the ones from rados benchmark?
>>> The rados bench default is 4MB, which not only happens to be the
>>> default RBD objectsize but also to generate a nice amount of
>>> bandwidth. 
>>>
>>> While at 4k writes your SDD is obviously bored, but actual OSD needs to
>>> handle all those writes and becomes limited by the IOPS it can peform.
>> Yes, it's quite bored and just shuffles data.
>> Maybe I've been thinking about this the wrong way,
>> but shouldn't the Journal buffer more until the Journal partition is full
>> or when the flush_interval is met.
>>
> Take a look at "journal queue max ops", which has a default of a mere 500,
> so that's full after 2 seconds. ^o^
Hm, that makes sense.


So, tested out both low values ( 5000 )  and large value ( 6553600 ),
but it didn't seem that change anything.
Any chance I could dump the current values from a running OSD, to
actually see what is in use?

Cheers,
Josef
> Cheers,
>
> Christian
>
>> Right now the rados benchmark gets about 1MB/s throughput. I really
>> don't know what is expected though, but it seems quite slow.
>>
>> sudo rados bench -p shared-1 -b 4096 300 write
>>  Maintaining 16 concurrent writes of 4096 bytes for up to 300 seconds or
>> 0 objects
>>  Object prefix: benchmark_data_px1_1502
>>    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
>> lat 0       0         0         0         0         0         -         0
>>      1      16       203       187  0.730312  0.730469  0.030537
>> 0.080467 2      16       397       381  0.744003  0.757812  0.141118
>> 0.0811331 3      16       625       609  0.792841  0.890625  0.017979
>> 0.0776631 4      16       889       873  0.852415   1.03125   0.10221
>> 0.0725933 5      16      1122      1106  0.863941  0.910156  0.001871
>> 0.0709095 6      16      1437      1421  0.924995   1.23047  0.035859
>> 0.0665901
>>
>> Thanks for helping me out,
>> Josef
>>> Regards,
>>>
>>> Christian
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to detect journal problems

Reply via email to