Re: [ceph-users] Luminous : performance degrade while read operations (ceph-volume)

nokia ceph Sun, 25 Feb 2018 19:11:42 -0800

The increase in work load is from client , which we adjusted now. However
the read statistics in ceph status is wrong shows always 2 times more ,
verified the disk read across cluster and it is seems to be inline with
client traffic.


Hence there is issue seems to be in ceph status read statistics .

Thanks,
Muthu

On Wed, Feb 21, 2018 at 6:57 PM, Alfredo Deza <[email protected]> wrote:

>
>
> On Tue, Feb 20, 2018 at 9:33 PM, nokia ceph <[email protected]>
> wrote:
>
>> Hi Alfredo Deza,
>>
>> I understand the point between lvm and simple however we see issue , was
>> it issue in luminous because we use same ceph config and workload from
>> client. The graphs i attached in previous mail is from ceph-volume lvm osd.
>>
>
> If the issue is a performance regression in Luminous I wouldn't know :( I
> was trying to say that if you are seeing the same regression with
> previously deployed OSDs then it can't possibly be a thing we are doing
> incorrectly in ceph-volume
>
>
>>
>> In this case does it ococcupies 2 times only inside ceph. If we consider
>> only lvm based system does this high iops because of dm-cache created for
>> each osd?.
>>
>
> Not sure again. Maybe someone else might be able to chime in on this.
>
>
>>
>> Meanwhile i will update some graphs to show this once i have.
>>
>> Thanks,
>> Muthu
>>
>> On Tuesday, February 20, 2018, Alfredo Deza <[email protected]> wrote:
>>
>>>
>>>
>>> On Mon, Feb 19, 2018 at 9:29 PM, nokia ceph <[email protected]>
>>> wrote:
>>>
>>>> Hi Alfredo Deza,
>>>>
>>>> We have 5 node platforms with lvm osd created from scratch and another
>>>> 5 node platform migrated from kraken which is ceph volume simple. Both has
>>>> same issue . Both platform has only hdd for osd.
>>>>
>>>> We also noticed 2 times disk iops more compare to kraken , this causes
>>>> less read performance. During rocksdb compaction the situation is worse.
>>>>
>>>>
>>>> Meanwhile we are building another platform creating osd using ceph-disk
>>>> and analyse on this.
>>>>
>>>
>>> If you have two platforms, one with `simple` and the other one with
>>> `lvm` experiencing the same, then something else must be at fault here.
>>>
>>> The `simple` setup in ceph-volume basically keeps everything as it was
>>> before, it just captures details of what devices were being used so OSDs
>>> can be started. There is no interaction from ceph-volume
>>> in there that could cause something like this.
>>>
>>>
>>>
>>>> Thanks,
>>>> Muthu
>>>>
>>>>
>>>>
>>>> On Tuesday, February 20, 2018, Alfredo Deza <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 19, 2018 at 2:01 PM, nokia ceph <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> We have 5 node clusters with EC 4+1 and use bluestore since last year
>>>>>> from Kraken.
>>>>>> Recently we migrated all our platforms to luminous 12.2.2 and finally
>>>>>> all OSDs migrated to ceph-volume simple type and on few platforms 
>>>>>> installed
>>>>>> ceph using ceph-volume .
>>>>>>
>>>>>> Now we see two times more traffic in read compare to client traffic
>>>>>> on migrated platform and newly created platforms . This was not the case 
>>>>>> in
>>>>>> older releases where ceph status read B/W will be same as client read
>>>>>> traffic.
>>>>>>
>>>>>> Some network graphs :
>>>>>>
>>>>>> *Client network interface* towards ceph public interface : shows
>>>>>> *4.3Gbps* read
>>>>>>
>>>>>>
>>>>>> [image: Inline image 2]
>>>>>>
>>>>>> *Ceph Node Public interface* : Each node around 960Mbps * 5 node =*
>>>>>> 4.6 Gbps *- this matches.
>>>>>> [image: Inline image 3]
>>>>>>
>>>>>> Ceph status output : show  1032 MB/s =* 8.06 Gbps*
>>>>>>
>>>>>> cn6.chn6us1c1.cdn ~# ceph status
>>>>>>   cluster:
>>>>>>     id:     abda22db-3658-4d33-9681-e3ff10690f88
>>>>>>     health: HEALTH_OK
>>>>>>
>>>>>>   services:
>>>>>>     mon: 5 daemons, quorum cn6,cn7,cn8,cn9,cn10
>>>>>>     mgr: cn6(active), standbys: cn7, cn9, cn10, cn8
>>>>>>     osd: 340 osds: 340 up, 340 in
>>>>>>
>>>>>>   data:
>>>>>>     pools:   1 pools, 8192 pgs
>>>>>>     objects: 270M objects, 426 TB
>>>>>>     usage:   581 TB used, 655 TB / 1237 TB avail
>>>>>>     pgs:     8160 active+clean
>>>>>>              32   active+clean+scrubbing
>>>>>>
>>>>>>   io:
>>>>>>     client:   *1032 MB/s rd*, 168 MB/s wr, 1908 op/s rd, 1594 op/s wr
>>>>>>
>>>>>>
>>>>>> Write operation we don't see this issue. Client traffic and this
>>>>>> matches.
>>>>>> Is this expected behavior in Luminous and ceph-volume lvm or a bug ?
>>>>>> Wrong calculation in ceph status read B/W ?
>>>>>>
>>>>>
>>>>> You mentioned `ceph-volume simple` but here you say lvm. With LVM
>>>>> ceph-volume will create the OSDs from scratch, while "simple" will keep
>>>>> whatever OSD was created before.
>>>>>
>>>>> Have you created the OSDs from scratch with ceph-volume? or is it just
>>>>> using "simple" , managing a previously deployed OSD?
>>>>>
>>>>>>
>>>>>> Please provide your feedback.
>>>>>>
>>>>>> Thanks,
>>>>>> Muthu
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> [email protected]
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>>
>>>>>
>>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous : performance degrade while read operations (ceph-volume)

Reply via email to