Hello everyone, 

I've recently performed a hardware upgrade on our small two osd server ceph 
cluster, which seems to have broke the ceph cluster. We are using ceph for 
cloudstack rbd images for vms.All of our servers are Ubuntu 14.04 LTS with 
latest updates and kernel 4.4.6 from ubuntu repo. 

Previous hardware: 

2 x osd servers with 9 sas osds, 32gb ram and 12 core Intel cpu 2620 @ 2Ghz 
each and 2 consumer SSDs for journal. Infiniband 40gbit/s networking using 
IPoIB. 

The following things were upgraded: 

1. journal ssds were upgraded from consumer ssd to Intel 3710 200gb. We now 
have 5 osds per single ssd. 
2. added additional osd server with 64gb ram, 10 osds, Intel 2670 cpu @ 2.6Ghz 
3. Upgraded ram on osd servers to become 64gb 
4. Installed additional osd disk to have 10 osds per server. 

After adding the third osd server and finishing the initial sync, the cluster 
worked okay for 1-2 days. No issues were noticed. On a third day my monitoring 
system started reporting a bunch of issues from the ceph cluster as well as 
from our virtual machines. This tend to happen between 7:20am and 7:40am and 
lasts for about 2-3 hours before things become normal again. I've checked the 
osd servers and there is nothing that I could find in cron or otherwise that 
starts around 7:20am. 

The problem is as follows: the new osd server's load goes to 400+ with ceph-osd 
processes consuming all cpu resources. The ceph -w shows a high number of slow 
requests which relate to osds belonging to the new osd server. The log files 
show the following: 

2016-04-20 07:39:04.346459 osd.7 192.168.168.200:6813/2650 2 : cluster [WRN] 
slow request 30.032033 seconds old, received at 2016-04-20 07:38:34.314014: 
osd_op(client.140476549.0:13203438 rbd_data.2c9de71520eedd1.0000000000000621 
[stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2572288~4096] 
5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for 
subops from 22 
2016-04-20 07:39:04.346465 osd.7 192.168.168.200:6813/2650 3 : cluster [WRN] 
slow request 30.031878 seconds old, received at 2016-04-20 07:38:34.314169: 
osd_op(client.140476549.0:13203439 rbd_data.2c9de71520eedd1.0000000000000621 
[stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1101824~8192] 
5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for 
rw locks 



There are practically every osd involved in the slow requests and they tend to 
be between the old two osd servers and the new one. There were no issues as far 
as I can see between the old two servers. 

The first thing i've checked is the networking. No issue was identified from 
running ping -i .1 <servername> as well as using hping3 for the tcp connection 
checks. The network tests were running for over a week and not a single packet 
was lost. The slow requests took place while the network tests were running. 

I've also checked the osd and ssd disks and I was not able to identify anything 
problematic. 

Stopping all osds on the new server causes no issues between the old two osd 
servers. I've left the new server disconnected for a few days and had no issues 
with the cluster. 

I am a bit lost on what else to try and how to debug the issue. Could someone 
please help me? 

Many thanks 

Andrei 
















_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to