Hi Ceph-users,
I am having some trouble in finding the bottleneck in my CephFS Infernalis
setup.
I am running 5 OSD servers which all have 6 OSD's each (so I have 30 OSD's in
total). Each OSD is a physical disk (non SSD) and each OSD has it's journal
stored on the first partition of it's own disk. I have 3 mon servers and 2 MDS
servers which are setup in active / passive mode. All servers have a redundant
10G NIC's configuration.
I am monitoring all resources of each server (cpu/memory/network/disk usage)
and I would expect that my first bottleneck would be the OSD disk speed but
looking at my graphs, that is not the case. I have plenty of CPU / Memory /
Network / Disk speed left but I still am not able to get a better performance.
The ceph cluster states that it is healthy. I have all setting default except
for the osd_op_threads. I have altered that to 20 instead of the default 2.
When looking to the processes on my OSD servers, you can see the expected
processes running:
[root@XXXX ~]# ps ajxf | grep ceph-osd
2497 25505 25504 2476 pts/0 25504 S+ 0 0:00
\_ grep --color=auto ceph-osd
1 10051 10051 10051 ? -1 Ssl 167 15584:14 /usr/bin/ceph-osd -f
--cluster ceph --id 3 --setuser ceph --setgroup ceph
1 11587 11587 11587 ? -1 Ssl 167 14991:09 /usr/bin/ceph-osd -f
--cluster ceph --id 4 --setuser ceph --setgroup ceph
1 12551 12551 12551 ? -1 Ssl 167 14687:16 /usr/bin/ceph-osd -f
--cluster ceph --id 5 --setuser ceph --setgroup ceph
1 18895 18895 18895 ? -1 Ssl 167 3052:43 /usr/bin/ceph-osd -f
--cluster ceph --id 22 --setuser ceph --setgroup ceph
1 20788 20788 20788 ? -1 Ssl 167 3314:31 /usr/bin/ceph-osd -f
--cluster ceph --id 23 --setuser ceph --setgroup ceph
1 27220 27220 27220 ? -1 Ssl 167 2240:37 /usr/bin/ceph-osd -f
--cluster ceph --id 26 --setuser ceph --setgroup ceph
When looking at the amount of threads that are being used for ceph osd id 5 for
instance, you can see this:
[root@XXXX ~]# ps huH p 12551 | wc -l
349
I would expect that this number is variable depending on the load on the
cluster. When increasing osd_op_threads to 25, I am seeing 354 threads for that
osd id. So the increase is correct but what are all the other threads? Is there
any easy way for mee to see if the configured max op threads is currently being
reached? Or is there any other bottleneck that I am overlooking?
Any clear view on this would be appreciated.
Kind regards,
Davie De Smet
Davie De Smet
Director Technical Operations and Customer Services, Nomadesk
[email protected]<mailto:[email protected]%0d>
+32 9 240 10 31 (Office)
Join Nomadesk: Facebook<http://www.facebook.com/Nomadesk> |
Twitter<http://twitter.com/#!/nomadesk>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com