Re: your previous question I will not elaborate on this much more, I hope some of you will try it if you have NUMA systems and see for yourself.
But I can recommend some docs: http://globalsp.ts.fujitsu.com/dmsp/Publications/public/wp-ivy-bridge-ep-memory-performance-ww-en.pdf <http://globalsp.ts.fujitsu.com/dmsp/Publications/public/wp-ivy-bridge-ep-memory-performance-ww-en.pdf> http://events.linuxfoundation.org/sites/events/files/eeus13_shelton.pdf <http://events.linuxfoundation.org/sites/events/files/eeus13_shelton.pdf> RHEL also has some nice documentation on the issue. If you don’t use ancient (like RHEL6) systems then your OS+kernel should do the “right thing” by default and take NUMA locality into account when scheduling and migrating. Jan > On 01 Jul 2015, at 03:02, Ray Sun <[email protected]> wrote: > > Jan, > Thanks a lot. I can do my contribution to this project if I can. > > Best Regards > -- Ray > > On Tue, Jun 30, 2015 at 11:50 PM, Jan Schermer <[email protected] > <mailto:[email protected]>> wrote: > Hi all, > our script is available on GitHub > > https://github.com/prozeta/pincpus <https://github.com/prozeta/pincpus> > > I haven’t had much time to do a proper README, but I hope the configuration > is self explanatory enough for now. > What it does is pin each OSD into the most “empty” cgroup assigned to a NUMA > node. > > Let me know how it works for you! > > Jan > > >> On 30 Jun 2015, at 10:50, Huang Zhiteng <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> On Tue, Jun 30, 2015 at 4:25 PM, Jan Schermer <[email protected] >> <mailto:[email protected]>> wrote: >> Not having OSDs and KVMs compete against each other is one thing. >> But there are more reasons to do this >> >> 1) not moving the processes and threads between cores that much (better >> cache utilization) >> 2) aligning the processes with memory on NUMA systems (that means all modern >> dual socket systems) - you don’t want your OSD running on CPU1 with memory >> allocated to CPU2 >> 3) the same goes for other resources like NICs or storage controllers - but >> that’s less important and not always practical to do >> 4) you can limit the scheduling domain on linux if you limit the cpuset for >> your OSDs (I’m not sure how important this is, just best practice) >> 5) you can easily limit memory or CPU usage, set priority, with much greater >> granularity than without cgroups >> 6) if you have HyperThreading enabled you get the most gain when the >> workloads on the threads are dissimiliar - so to have the higher throughput >> you have to pin OSD to thread1 and KVM to thread2 on the same core. We’re >> not doing that because latency and performance of the core can vary >> depending on what the other thread is doing. But it might be useful to >> someone. >> >> Some workloads exhibit >100% performance gain when everything aligns in a >> NUMA system, compared to a SMP mode on the same hardware. You likely won’t >> notice it on light workloads, as the interconnects (QPI) are very fast and >> there’s a lot of bandwidth, but for stuff like big OLAP databases or other >> data-manipulation workloads there’s a huge difference. And with CEPH being >> CPU hungy and memory intensive, we’re seeing some big gains here just by >> co-locating the memory with the processes…. >> Could you elaborate a it on this? I'm interested to learn in what situation >> memory locality helps Ceph to what extend. >> >> >> Jan >> >> >>> On 30 Jun 2015, at 08:12, Ray Sun <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Sound great, any update please let me know. >>> >>> Best Regards >>> -- Ray >>> >>> On Tue, Jun 30, 2015 at 1:46 AM, Jan Schermer <[email protected] >>> <mailto:[email protected]>> wrote: >>> I promised you all our scripts for automatic cgroup assignment - they are >>> in our production already and I just need to put them on github, stay tuned >>> tomorrow :-) >>> >>> Jan >>> >>> >>>> On 29 Jun 2015, at 19:41, Somnath Roy <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Presently, you have to do it by using tool like ‘taskset’ or ‘numactl’… >>>> >>>> Thanks & Regards >>>> Somnath >>>> >>>> From: ceph-users [mailto:[email protected] >>>> <mailto:[email protected]>] On Behalf Of Ray Sun >>>> Sent: Monday, June 29, 2015 9:19 AM >>>> To: [email protected] <mailto:[email protected]> >>>> Subject: [ceph-users] How to use cgroup to bind ceph-osd to a specific cpu >>>> core? >>>> >>>> Cephers, >>>> I want to bind each of my ceph-osd to a specific cpu core, but I didn't >>>> find any document to explain that, could any one can provide me some >>>> detailed information. Thanks. >>>> >>>> Currently, my ceph is running like this: >>>> >>>> oot 28692 1 0 Jun23 ? 00:37:26 /usr/bin/ceph-mon -i >>>> seed.econe.com <http://seed.econe.com/> --pid-file >>>> /var/run/ceph/mon.seed.econe.com.pid -c /etc/ceph/ceph.conf --cluster ceph >>>> root 40063 1 1 Jun23 ? 02:13:31 /usr/bin/ceph-osd -i 0 >>>> --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph >>>> root 42096 1 0 Jun23 ? 01:33:42 /usr/bin/ceph-osd -i 1 >>>> --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph >>>> root 43263 1 0 Jun23 ? 01:22:59 /usr/bin/ceph-osd -i 2 >>>> --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph >>>> root 44527 1 0 Jun23 ? 01:16:53 /usr/bin/ceph-osd -i 3 >>>> --pid-file /var/run/ceph/osd.3.pid -c /etc/ceph/ceph.conf --cluster ceph >>>> root 45863 1 0 Jun23 ? 01:25:18 /usr/bin/ceph-osd -i 4 >>>> --pid-file /var/run/ceph/osd.4.pid -c /etc/ceph/ceph.conf --cluster ceph >>>> root 47462 1 0 Jun23 ? 01:20:36 /usr/bin/ceph-osd -i 5 >>>> --pid-file /var/run/ceph/osd.5.pid -c /etc/ceph/ceph.conf --cluster ceph >>>> >>>> Best Regards >>>> -- Ray >>>> >>>> >>>> PLEASE NOTE: The information contained in this electronic mail message is >>>> intended only for the use of the designated recipient(s) named above. If >>>> the reader of this message is not the intended recipient, you are hereby >>>> notified that you have received this message in error and that any review, >>>> dissemination, distribution, or copying of this message is strictly >>>> prohibited. If you have received this communication in error, please >>>> notify the sender by telephone or e-mail (as shown above) immediately and >>>> destroy any and all copies of this message in your possession (whether >>>> hard copies or electronically stored copies). >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> [email protected] <mailto:[email protected]> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>> >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> >> >> >> >> -- >> Regards >> Huang Zhiteng > > > _______________________________________________ > ceph-users mailing list > [email protected] <mailto:[email protected]> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
