Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

Somnath Roy Thu, 23 Jun 2016 15:26:29 -0700

Or even vm.vfs_cache_pressure = 0 if you have sufficient memory to *pin* 
inode/dentries in memory.
We are using that for long now (with 128 TB node memory) and it seems helping 
specially for the random write workload and saving xattrs read in between.


Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto:[email protected]] On Behalf Of Warren 
Wang - ISD
Sent: Thursday, June 23, 2016 3:09 PM
To: Wade Holler; Blair Bethwaite
Cc: Ceph Development; [email protected]
Subject: Re: [ceph-users] Dramatic performance drop at certain number of 
objects in pool

vm.vfs_cache_pressure = 100

Go the other direction on that. You¹ll want to keep it low to help keep 
inode/dentry info in memory. We use 10, and haven¹t had a problem.


Warren Wang




On 6/22/16, 9:41 PM, "Wade Holler" <[email protected]> wrote:

>Blairo,
>
>We'll speak in pre-replication numbers, replication for this pool is 3.
>
>23.3 Million Objects / OSD
>pg_num 2048
>16 OSDs / Server
>3 Servers
>660 GB RAM Total, 179 GB Used (free -t) / Server vm.swappiness = 1
>vm.vfs_cache_pressure = 100
>
>Workload is native librados with python.  ALL 4k objects.
>
>Best Regards,
>Wade
>
>
>On Wed, Jun 22, 2016 at 9:33 PM, Blair Bethwaite
><[email protected]> wrote:
>> Wade, good to know.
>>
>> For the record, what does this work out to roughly per OSD? And how
>> much RAM and how many PGs per OSD do you have?
>>
>> What's your workload? I wonder whether for certain workloads (e.g.
>> RBD) it's better to increase default object size somewhat before
>> pushing the split/merge up a lot...
>>
>> Cheers,
>>
>> On 23 June 2016 at 11:26, Wade Holler <[email protected]> wrote:
>>> Based on everyones suggestions; The first modification to 50 / 16
>>> enabled our config to get to ~645Mill objects before the behavior in
>>> question was observed (~330 was the previous ceiling).  Subsequent
>>> modification to 50 / 24 has enabled us to get to 1.1 Billion+
>>>
>>> Thank you all very much for your support and assistance.
>>>
>>> Best Regards,
>>> Wade
>>>
>>>
>>> On Mon, Jun 20, 2016 at 6:58 PM, Christian Balzer <[email protected]>
>>>wrote:
>>>>
>>>> Hello,
>>>>
>>>> On Mon, 20 Jun 2016 20:47:32 +0000 Warren Wang - ISD wrote:
>>>>
>>>>> Sorry, late to the party here. I agree, up the merge and split
>>>>>thresholds. We're as high as 50/12. I chimed in on an RH ticket here.
>>>>> One of those things you just have to find out as an operator since
>>>>>it's  not well documented :(
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1219974
>>>>>
>>>>> We have over 200 million objects in this cluster, and it's still
>>>>>doing  over 15000 write IOPS all day long with 302 spinning drives
>>>>>+ SATA SSD  journals. Having enough memory and dropping your
>>>>>vfs_cache_pressure  should also help.
>>>>>
>>>> Indeed.
>>>>
>>>> Since it was asked in that bug report and also my first suspicion,
>>>>it  would probably be good time to clarify that it isn't the splits
>>>>that cause  the performance degradation, but the resulting inflation
>>>>of dir entries  and exhaustion of SLAB and thus having to go to disk
>>>>for things that  normally would be in memory.
>>>>
>>>> Looking at Blair's graph from yesterday pretty much makes that
>>>>clear, a  purely split caused degradation should have relented much
>>>>quicker.
>>>>
>>>>
>>>>> Keep in mind that if you change the values, it won't take effect
>>>>> immediately. It only merges them back if the directory is under
>>>>> the calculated threshold and a write occurs (maybe a read, I forget).
>>>>>
>>>> If it's a read a plain scrub might do the trick.
>>>>
>>>> Christian
>>>>> Warren
>>>>>
>>>>>
>>>>> From: ceph-users
>>>>>
>>>>><[email protected]<mailto:ceph-users-bounces@lists.
>>>>>cep
>>>>>h.com>>
>>>>> on behalf of Wade Holler
>>>>> <[email protected]<mailto:[email protected]>> Date:
>>>>>Monday, June  20, 2016 at 2:48 PM To: Blair Bethwaite
>>>>><[email protected]<mailto:[email protected]>>, Wido
>>>>>den  Hollander <[email protected]<mailto:[email protected]>> Cc: Ceph
>>>>>Development
>>>>><[email protected]<mailto:[email protected]>>,
>>>>> "[email protected]<mailto:[email protected]>"
>>>>> <[email protected]<mailto:[email protected]>>
>>>>>Subject:
>>>>> Re: [ceph-users] Dramatic performance drop at certain number of
>>>>>objects  in pool
>>>>>
>>>>> Thanks everyone for your replies.  I sincerely appreciate it. We
>>>>> are testing with different pg_num and filestore_split_multiple settings.
>>>>> Early indications are .... well not great. Regardless it is nice
>>>>> to understand the symptoms better so we try to design around it.
>>>>>
>>>>> Best Regards,
>>>>> Wade
>>>>>
>>>>>
>>>>> On Mon, Jun 20, 2016 at 2:32 AM Blair Bethwaite
>>>>><[email protected]<mailto:[email protected]>> wrote:
>>>>>On
>>>>> 20 June 2016 at 09:21, Blair Bethwaite
>>>>><[email protected]<mailto:[email protected]>> wrote:
>>>>> > slow request issues). If you watch your xfs stats you'll likely
>>>>> > get further confirmation. In my experience xs_dir_lookups
>>>>> > balloons
>>>>>(which
>>>>> > means directory lookups are missing cache and going to disk).
>>>>>
>>>>> Murphy's a bitch. Today we upgraded a cluster to latest Hammer in
>>>>> preparation for Jewel/RHCS2. Turns out when we last hit this very
>>>>> problem we had only ephemerally set the new filestore merge/split
>>>>> values - oops. Here's what started happening when we upgraded and
>>>>> restarted a bunch of OSDs:
>>>>>
>>>>>https://au-east.erc.monash.edu.au/swift/v1/public/grafana-ceph-xs_d
>>>>>ir_
>>>>>lookup.png
>>>>>
>>>>> Seemed to cause lots of slow requests :-/. We corrected it about
>>>>> 12:30, then still took a while to settle.
>>>>>
>>>>> --
>>>>> Cheers,
>>>>> ~Blairo
>>>>>
>>>>> This email and any files transmitted with it are confidential and
>>>>>intended solely for the individual or entity to whom they are
>>>>>addressed.
>>>>> If you have received this email in error destroy it immediately.
>>>>>***  Walmart Confidential ***
>>>>
>>>>
>>>> --
>>>> Christian Balzer        Network/Systems Engineer
>>>> [email protected]           Global OnLine Japan/Rakuten Communications
>>>> http://www.gol.com/
>>
>>
>>
>> --
>> Cheers,
>> ~Blairo

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
*** _______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Dramatic performance drop at certain number of objects in pool

Reply via email to