Yeah, perf top will help you a lot..

Some guess:

1. If your block size is small 4-16K range, most probably you are hitting the 
tcmalloc issue. 'perf top' will show up with lot of tcmalloc traces in that 
case.

2. fdcache should save you some cpu but I don't see it will be that significant.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto:[email protected]] On Behalf Of Jan 
Schermer
Sent: Thursday, June 11, 2015 5:57 AM
To: Dan van der Ster
Cc: [email protected]
Subject: Re: [ceph-users] Restarting OSD leads to lower CPU usage

I have no experience with perf and the package is not installed.
I will take a look at it, thanks.

Jan


> On 11 Jun 2015, at 13:48, Dan van der Ster <[email protected]> wrote:
>
> Hi Jan,
>
> Can you get perf top running? It should show you where the OSDs are 
> spinning...
>
> Cheers, Dan
>
> On Thu, Jun 11, 2015 at 11:21 AM, Jan Schermer <[email protected]> wrote:
>> Hi,
>> hoping someone can point me in the right direction.
>>
>> Some of my OSDs have a larger CPU usage (and ops latencies) than others. If 
>> I restart the OSD everything runs nicely for some time, then it creeps up.
>>
>> 1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to 
>> 80%. Restarting means the offending OSDs only use 40% again.
>> 2) average latencies and CPU usage on the host are the same - so it’s
>> not caused by the host that the OSD is running on
>> 3) I can’t say exactly when or how the issue happens. I can’t even say if 
>> it’s the same OSDs. It seems it either happens when something heavy happens 
>> in a cluster (like dropping very old snapshots, rebalancing) and then 
>> doesn’t come back, or maybe it happens slowly over time and I can’t find it 
>> in the graphs. Looking at the graphs it seems to be the former.
>>
>> I have just one suspicion and that is the “fd cache size” - we have
>> it set to 16384 but the open fds suggest there are more open files for the 
>> osd process (over 17K fds) - it varies by some hundreds between the osds. 
>> Maybe some are just slightly over the limit and the misses cause this? 
>> Restarting the OSD clears them (~2K) and they increase over time. I 
>> increased it to 32768 yesterday and it consistently nice now, but it might 
>> take another few days to manifest… Could this explain it? Any other tips?
>>
>> Thanks
>>
>> Jan
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to