On 27-07-15 14:21, Jan Schermer wrote:
> Hi!
> The /cgroup/* mount point is probably a RHEL6 thing, recent distributions 
> seem to use /sys/fs/cgroup like in your case (maybe because of systemd?). On 
> RHEL 6 the mount points are configured in /etc/cgconfig.conf and /cgroup is 
> the default.
> 
> I also saw the pull request from you on github and I don’t think I’ll merge 
> it because creating the directory if the parent does not exist could mask the 
> non-existence of cgroups or a different mountpoint, so I think it’s better to 
> fail and leave it up to the admin to modify the script.
> A more mature solution would probably be some sort of OS-specific integration 
> (automatic cgclassify rules, initscript-ed cgroup creation and such). When 
> this support is already in place maintainers only need to integrate it. In 
> newer distros a new kernel (scheduler) with more NUMA awareness and other 
> autotuning could do a better job than this script by default.
> 
> And if any CEPH devs are listening: I saw an issue on CEPH tracker for cgroup 
> classification http://tracker.ceph.com/issues/12424 and I humbly advise you 
> not to do that - this will either turn into something distro-specific or it 
> will create an Inner Platform Effect on all distros that maintainers 
> downstream will need to replace with their own anyway. Of course since 
> Inktank is somewhat part of RedHat now it makes sense to integrate it into 
> RHOS, RHEV and CEPH packages for RHEL and make a profile for “tuned” or 
> whatever does the tuning magic.
> 
> Btw has anybody else tried it? What are your results? We still use it and it 
> makes a big difference on NUMA systems, even bigger difference when mixed 
> with KVM guests on the same hardware.
>  

I'm testing with it on 48-core, 256GB machines with 90 OSDs each. This
is a +/- 20PB Ceph cluster and I'm trying to see how much we would
benefit from it.

Wido

> Thanks
> Jan
> 
> 
> 
>> On 27 Jul 2015, at 13:23, Saverio Proto <ziopr...@gmail.com> wrote:
>>
>> Hello Jan,
>>
>> I am testing your scripts, because we want also to test OSDs and VMs
>> on the same server.
>>
>> I am new to cgroups, so this might be a very newbie question.
>> In your script you always reference to the file
>> /cgroup/cpuset/libvirt/cpuset.cpus
>>
>> but I have the file in /sys/fs/cgroup/cpuset/libvirt/cpuset.cpus
>>
>> I am working on Ubuntu 14.04
>>
>> This difference comes from something special in your setup, or maybe
>> because we are working on different Linux distributions ?
>>
>> Thanks for clarification.
>>
>> Saverio
>>
>>
>>
>> 2015-06-30 17:50 GMT+02:00 Jan Schermer <j...@schermer.cz>:
>>> Hi all,
>>> our script is available on GitHub
>>>
>>> https://github.com/prozeta/pincpus
>>>
>>> I haven’t had much time to do a proper README, but I hope the configuration
>>> is self explanatory enough for now.
>>> What it does is pin each OSD into the most “empty” cgroup assigned to a NUMA
>>> node.
>>>
>>> Let me know how it works for you!
>>>
>>> Jan
>>>
>>>
>>> On 30 Jun 2015, at 10:50, Huang Zhiteng <winsto...@gmail.com> wrote:
>>>
>>>
>>>
>>> On Tue, Jun 30, 2015 at 4:25 PM, Jan Schermer <j...@schermer.cz> wrote:
>>>>
>>>> Not having OSDs and KVMs compete against each other is one thing.
>>>> But there are more reasons to do this
>>>>
>>>> 1) not moving the processes and threads between cores that much (better
>>>> cache utilization)
>>>> 2) aligning the processes with memory on NUMA systems (that means all
>>>> modern dual socket systems) - you don’t want your OSD running on CPU1 with
>>>> memory allocated to CPU2
>>>> 3) the same goes for other resources like NICs or storage controllers -
>>>> but that’s less important and not always practical to do
>>>> 4) you can limit the scheduling domain on linux if you limit the cpuset
>>>> for your OSDs (I’m not sure how important this is, just best practice)
>>>> 5) you can easily limit memory or CPU usage, set priority, with much
>>>> greater granularity than without cgroups
>>>> 6) if you have HyperThreading enabled you get the most gain when the
>>>> workloads on the threads are dissimiliar - so to have the higher throughput
>>>> you have to pin OSD to thread1 and KVM to thread2 on the same core. We’re
>>>> not doing that because latency and performance of the core can vary
>>>> depending on what the other thread is doing. But it might be useful to
>>>> someone.
>>>>
>>>> Some workloads exhibit >100% performance gain when everything aligns in a
>>>> NUMA system, compared to a SMP mode on the same hardware. You likely won’t
>>>> notice it on light workloads, as the interconnects (QPI) are very fast and
>>>> there’s a lot of bandwidth, but for stuff like big OLAP databases or other
>>>> data-manipulation workloads there’s a huge difference. And with CEPH being
>>>> CPU hungy and memory intensive, we’re seeing some big gains here just by
>>>> co-locating the memory with the processes….
>>>
>>> Could you elaborate a it on this?  I'm interested to learn in what situation
>>> memory locality helps Ceph to what extend.
>>>>
>>>>
>>>>
>>>> Jan
>>>>
>>>>
>>>>
>>>> On 30 Jun 2015, at 08:12, Ray Sun <xiaoq...@gmail.com> wrote:
>>>>
>>>> Sound great, any update please let me know.
>>>>
>>>> Best Regards
>>>> -- Ray
>>>>
>>>> On Tue, Jun 30, 2015 at 1:46 AM, Jan Schermer <j...@schermer.cz> wrote:
>>>>>
>>>>> I promised you all our scripts for automatic cgroup assignment - they are
>>>>> in our production already and I just need to put them on github, stay 
>>>>> tuned
>>>>> tomorrow :-)
>>>>>
>>>>> Jan
>>>>>
>>>>>
>>>>> On 29 Jun 2015, at 19:41, Somnath Roy <somnath....@sandisk.com> wrote:
>>>>>
>>>>> Presently, you have to do it by using tool like ‘taskset’ or ‘numactl’…
>>>>>
>>>>> Thanks & Regards
>>>>> Somnath
>>>>>
>>>>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>>>>> Ray Sun
>>>>> Sent: Monday, June 29, 2015 9:19 AM
>>>>> To: ceph-users@lists.ceph.com
>>>>> Subject: [ceph-users] How to use cgroup to bind ceph-osd to a specific
>>>>> cpu core?
>>>>>
>>>>> Cephers,
>>>>> I want to bind each of my ceph-osd to a specific cpu core, but I didn't
>>>>> find any document to explain that, could any one can provide me some
>>>>> detailed information. Thanks.
>>>>>
>>>>> Currently, my ceph is running like this:
>>>>>
>>>>> oot      28692      1  0 Jun23 ?        00:37:26 /usr/bin/ceph-mon -i
>>>>> seed.econe.com --pid-file /var/run/ceph/mon.seed.econe.com.pid -c
>>>>> /etc/ceph/ceph.conf --cluster ceph
>>>>> root      40063      1  1 Jun23 ?        02:13:31 /usr/bin/ceph-osd -i 0
>>>>> --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>>> root      42096      1  0 Jun23 ?        01:33:42 /usr/bin/ceph-osd -i 1
>>>>> --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>>> root      43263      1  0 Jun23 ?        01:22:59 /usr/bin/ceph-osd -i 2
>>>>> --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>>> root      44527      1  0 Jun23 ?        01:16:53 /usr/bin/ceph-osd -i 3
>>>>> --pid-file /var/run/ceph/osd.3.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>>> root      45863      1  0 Jun23 ?        01:25:18 /usr/bin/ceph-osd -i 4
>>>>> --pid-file /var/run/ceph/osd.4.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>>> root      47462      1  0 Jun23 ?        01:20:36 /usr/bin/ceph-osd -i 5
>>>>> --pid-file /var/run/ceph/osd.5.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>>>
>>>>> Best Regards
>>>>> -- Ray
>>>>>
>>>>> ________________________________
>>>>>
>>>>> PLEASE NOTE: The information contained in this electronic mail message is
>>>>> intended only for the use of the designated recipient(s) named above. If 
>>>>> the
>>>>> reader of this message is not the intended recipient, you are hereby
>>>>> notified that you have received this message in error and that any review,
>>>>> dissemination, distribution, or copying of this message is strictly
>>>>> prohibited. If you have received this communication in error, please 
>>>>> notify
>>>>> the sender by telephone or e-mail (as shown above) immediately and destroy
>>>>> any and all copies of this message in your possession (whether hard copies
>>>>> or electronically stored copies).
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Huang Zhiteng
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to