Re: your previous question

I will not elaborate on this much more, I hope some of you will try it if you 
have NUMA systems and see for yourself.

But I can recommend some docs:
http://globalsp.ts.fujitsu.com/dmsp/Publications/public/wp-ivy-bridge-ep-memory-performance-ww-en.pdf
 
<http://globalsp.ts.fujitsu.com/dmsp/Publications/public/wp-ivy-bridge-ep-memory-performance-ww-en.pdf>

http://events.linuxfoundation.org/sites/events/files/eeus13_shelton.pdf 
<http://events.linuxfoundation.org/sites/events/files/eeus13_shelton.pdf>

RHEL also has some nice documentation on the issue. If you don’t use ancient 
(like RHEL6) systems then your OS+kernel should do the “right thing” by default 
and take NUMA locality into account when scheduling and migrating.

Jan


> On 01 Jul 2015, at 03:02, Ray Sun <[email protected]> wrote:
> 
> Jan,
> Thanks a lot. I can do my contribution to this project if I can.
> 
> Best Regards
> -- Ray
> 
> On Tue, Jun 30, 2015 at 11:50 PM, Jan Schermer <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi all,
> our script is available on GitHub
> 
> https://github.com/prozeta/pincpus <https://github.com/prozeta/pincpus>
> 
> I haven’t had much time to do a proper README, but I hope the configuration 
> is self explanatory enough for now.
> What it does is pin each OSD into the most “empty” cgroup assigned to a NUMA 
> node.
> 
> Let me know how it works for you!
> 
> Jan
> 
> 
>> On 30 Jun 2015, at 10:50, Huang Zhiteng <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> 
>> On Tue, Jun 30, 2015 at 4:25 PM, Jan Schermer <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Not having OSDs and KVMs compete against each other is one thing.
>> But there are more reasons to do this
>> 
>> 1) not moving the processes and threads between cores that much (better 
>> cache utilization)
>> 2) aligning the processes with memory on NUMA systems (that means all modern 
>> dual socket systems) - you don’t want your OSD running on CPU1 with memory 
>> allocated to CPU2
>> 3) the same goes for other resources like NICs or storage controllers - but 
>> that’s less important and not always practical to do
>> 4) you can limit the scheduling domain on linux if you limit the cpuset for 
>> your OSDs (I’m not sure how important this is, just best practice)
>> 5) you can easily limit memory or CPU usage, set priority, with much greater 
>> granularity than without cgroups
>> 6) if you have HyperThreading enabled you get the most gain when the 
>> workloads on the threads are dissimiliar - so to have the higher throughput 
>> you have to pin OSD to thread1 and KVM to thread2 on the same core. We’re 
>> not doing that because latency and performance of the core can vary 
>> depending on what the other thread is doing. But it might be useful to 
>> someone.
>> 
>> Some workloads exhibit >100% performance gain when everything aligns in a 
>> NUMA system, compared to a SMP mode on the same hardware. You likely won’t 
>> notice it on light workloads, as the interconnects (QPI) are very fast and 
>> there’s a lot of bandwidth, but for stuff like big OLAP databases or other 
>> data-manipulation workloads there’s a huge difference. And with CEPH being 
>> CPU hungy and memory intensive, we’re seeing some big gains here just by 
>> co-locating the memory with the processes….
>> Could you elaborate a it on this?  I'm interested to learn in what situation 
>> memory locality helps Ceph to what extend. 
>> 
>> 
>> Jan
>> 
>>  
>>> On 30 Jun 2015, at 08:12, Ray Sun <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> ​Sound great, any update please let me know.​
>>> 
>>> Best Regards
>>> -- Ray
>>> 
>>> On Tue, Jun 30, 2015 at 1:46 AM, Jan Schermer <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> I promised you all our scripts for automatic cgroup assignment - they are 
>>> in our production already and I just need to put them on github, stay tuned 
>>> tomorrow :-)
>>> 
>>> Jan
>>> 
>>> 
>>>> On 29 Jun 2015, at 19:41, Somnath Roy <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Presently, you have to do it by using tool like ‘taskset’ or ‘numactl’…
>>>>  
>>>> Thanks & Regards
>>>> Somnath
>>>>  
>>>> From: ceph-users [mailto:[email protected] 
>>>> <mailto:[email protected]>] On Behalf Of Ray Sun
>>>> Sent: Monday, June 29, 2015 9:19 AM
>>>> To: [email protected] <mailto:[email protected]>
>>>> Subject: [ceph-users] How to use cgroup to bind ceph-osd to a specific cpu 
>>>> core?
>>>>  
>>>> Cephers,
>>>> I want to bind each of my ceph-osd to a specific cpu core, but I didn't 
>>>> find any document to explain that, could any one can provide me some 
>>>> detailed information. Thanks.
>>>>  
>>>> Currently, my ceph is running like this:
>>>>  
>>>> oot      28692      1  0 Jun23 ?        00:37:26 /usr/bin/ceph-mon -i 
>>>> seed.econe.com <http://seed.econe.com/> --pid-file 
>>>> /var/run/ceph/mon.seed.econe.com.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>> root      40063      1  1 Jun23 ?        02:13:31 /usr/bin/ceph-osd -i 0 
>>>> --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>> root      42096      1  0 Jun23 ?        01:33:42 /usr/bin/ceph-osd -i 1 
>>>> --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>> root      43263      1  0 Jun23 ?        01:22:59 /usr/bin/ceph-osd -i 2 
>>>> --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>> root      44527      1  0 Jun23 ?        01:16:53 /usr/bin/ceph-osd -i 3 
>>>> --pid-file /var/run/ceph/osd.3.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>> root      45863      1  0 Jun23 ?        01:25:18 /usr/bin/ceph-osd -i 4 
>>>> --pid-file /var/run/ceph/osd.4.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>> root      47462      1  0 Jun23 ?        01:20:36 /usr/bin/ceph-osd -i 5 
>>>> --pid-file /var/run/ceph/osd.5.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>>  
>>>> Best Regards
>>>> -- Ray
>>>> 
>>>> 
>>>> PLEASE NOTE: The information contained in this electronic mail message is 
>>>> intended only for the use of the designated recipient(s) named above. If 
>>>> the reader of this message is not the intended recipient, you are hereby 
>>>> notified that you have received this message in error and that any review, 
>>>> dissemination, distribution, or copying of this message is strictly 
>>>> prohibited. If you have received this communication in error, please 
>>>> notify the sender by telephone or e-mail (as shown above) immediately and 
>>>> destroy any and all copies of this message in your possession (whether 
>>>> hard copies or electronically stored copies).
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> [email protected] <mailto:[email protected]>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> 
>> 
>> 
>> 
>> -- 
>> Regards
>> Huang Zhiteng
> 
> 
> _______________________________________________
> ceph-users mailing list
> [email protected] <mailto:[email protected]>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> 

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to