Hi Andrija, 

I've got at least two more stories of similar nature. One is my friend running 
a ceph cluster and one is from me. Both of our clusters are pretty small. My 
cluster has only two osd servers with 8 osds each, 3 mons. I have an ssd 
journal per 4 osds. My friend has a cluster of 3 mons and 3 osd servers with 4 
osds each and an ssd per 4 osds as well. Both clusters are connected with 
40gbit/s IP over Infiniband links. 

We had the same issue while upgrading to firefly. However, we did not add any 
new disks, just ran the "ceph osd crush tunables optimal" command after 
following an upgrade. 

Both of our clusters were "down" as far as the virtual machines are concerned. 
All vms have crashed because of the lack of IO. It was a bit problematic, 
taking into account that ceph is typically so great at staying alive during 
failures and upgrades. So, there seems to be a problem with the upgrade. I wish 
devs would have added a big note in red letters that if you run this command it 
will likely affect your cluster performance and most likely all your vms will 
die. So, please shutdown your vms if you do not want to have data loss. 

I've changed the default values to reduce the load during recovery and also to 
tune a few things performance wise. My settings were: 



osd recovery max chunk = 8388608 

osd recovery op priority = 2 

osd max backfills = 1 

osd recovery max active = 1 

osd recovery threads = 1 

osd disk threads = 2 

filestore max sync interval = 10 

filestore op threads = 20 

filestore_flusher = false 

However, this didn't help much and i've noticed that shortly after running the 
tunnables command my guest vms iowait has quickly jumped to 50% and a to 99% a 
minute after. This has happened on all vms at once. During the recovery phase I 
ran the "rbd -p <poolname> ls -l" command several times and it took between 
20-40 minutes to complete. It typically takes less than 2 seconds when the 
cluster is not in recovery mode. 

My mate's cluster had the same tunables apart from the last three. He had 
exactly the same behaviour. 

One other thing that i've noticed is that somewhere in the docs I've read that 
running the tunnable optimal command should move not more than 10% of your 
data. However, in both of our cases our status was just over 30% degraded and 
it took a good part of 9 hours to complete the data reshuffling. 


Any comments from the ceph team or other ceph gurus on: 

1. What have we done wrong in our upgrade process 
2. What options should we have used to keep our vms alive 


Cheers 

Andrei 




----- Original Message -----

From: "Andrija Panic" <[email protected]> 
To: [email protected] 
Sent: Sunday, 13 July, 2014 9:54:17 PM 
Subject: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the 
same time 

Hi, 

after seting ceph upgrade (0.72.2 to 0.80.3) I have issued "ceph osd crush 
tunables optimal" and after only few minutes I have added 2 more OSDs to the 
CEPH cluster... 

So these 2 changes were more or a less done at the same time - rebalancing 
because of tunables optimal, and rebalancing because of adding new OSD... 

Result - all VMs living on CEPH storage have gone mad, no disk access 
efectively, blocked so to speak. 

Since this rebalancing took 5h-6h, I had bunch of VMs down for that long... 

Did I do wrong by causing "2 rebalancing" to happen at the same time ? 
Is this behaviour normal, to cause great load on all VMs because they are 
unable to access CEPH storage efectively ? 

Thanks for any input... 
-- 

Andrija Panić 

_______________________________________________ 
ceph-users mailing list 
[email protected] 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to