On 04/11/2014 02:45 PM, Greg Poirier wrote:
So... our storage problems persisted for about 45 minutes. I gave an entire hypervisor worth of VM's time to recover (approx. 30 vms), and none of them recovered on their own. In the end, we had to stop and start every VM (easily done, it was just alarming). Once rebooted, the VMs of course were fine.
So that's interesting. I'm going to try this myself as well since I think they should continue I/O at some point.
I marked the two full OSDs as down and out. I am a little concerned that these two are full while the cluster, in general, is only at 50% capacity. It appears we may have a hot spot. I'm going to look into that later today. Also, I'm not sure how it happened, but pgp_num is lower than pg_num. I had not noticed that until last night. Will address that as well. This probably happened when i last resized placement groups or potentially when I setup object storage pools. On Fri, Apr 11, 2014 at 3:49 AM, Wido den Hollander <w...@42on.com <mailto:w...@42on.com>> wrote: On 04/11/2014 09:23 AM, Josef Johansson wrote: On 11/04/14 09:07, Wido den Hollander wrote: Op 11 april 2014 om 8:50 schreef Josef Johansson <jo...@oderland.se <mailto:jo...@oderland.se>>: Hi, On 11/04/14 07:29, Wido den Hollander wrote: Op 11 april 2014 om 7:13 schreef Greg Poirier <greg.poir...@opower.com <mailto:greg.poir...@opower.com>>: One thing to note.... All of our kvm VMs have to be rebooted. This is something I wasn't expecting. Tried waiting for them to recover on their own, but that's not happening. Rebooting them restores service immediately. :/ Not ideal. A reboot isn't really required though. It could be that the VM itself is in trouble, but from a librados/librbd perspective I/O should simply continue as soon as a osdmap has been received without the "full" flag. It could be that you have to wait some time before the VM continues. This can take up to 15 minutes. With other storage solution you would have to change the timeout-value for each disk, i.e. changing to 180 secs from 60 secs, for the VMs to survive storage problems. Does Ceph handle this differently somehow? It's not that RBD does it differently. Librados simply blocks the I/O and thus dus librbd which then causes Qemu to block. I've seen VMs survive RBD issues for longer periods then 60 seconds. Gave them some time and they continued again. Which exact setting are you talking about? I'm talking about a Qemu/KVM VM running with a VirtIO drive. cat /sys/block/*/device/timeout (http://kb.vmware.com/__selfservice/microsites/search.__do?language=en_US&cmd=__displayKC&externalId=1009465 <http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009465>) This file is non-existant for my Ceph-VirtIO-drive however, so it seems RBD handles this. Well, I don't think it's handled by RBD, but VirtIO simply doesn't have the timeout. That's probably only in the SCSI driver. Wido I have just Para-Virtualized VMs to compare with right now, and they don't have it inside the VM, but that's expected. From my understanding it should've been there if it was a HVM. Whenever the timeout was reached, an error occured and the disk was set in read-only-mode. Cheers, Josef Wido Cheers, Josef Wido On Thu, Apr 10, 2014 at 10:12 PM, Greg Poirier <greg.poir...@opower.com <mailto:greg.poir...@opower.com>>__wrote: Going to try increasing the full ratio. Disk utilization wasn't really growing at an unreasonable pace. I'm going to keep an eye on it for the next couple of hours and down/out the OSDs if necessary. We have four more machines that we're in the process of adding (which doubles the number of OSDs), but got held up by some networking nonsense. Thanks for the tips. On Thu, Apr 10, 2014 at 9:51 PM, Sage Weil <s...@inktank.com <mailto:s...@inktank.com>> wrote: On Thu, 10 Apr 2014, Greg Poirier wrote: Hi, I have about 200 VMs with a common RBD volume as their root filesystem and a number of additional filesystems on Ceph. All of them have stopped responding. One of the OSDs in my cluster is marked full. I tried stopping that OSD to force things to rebalance or at least go to degraded mode, but nothing is responding still. I'm not exactly sure what to do or how to investigate. Suggestions? Try marking the osd out or partially out (ceph osd reweight N .9) to move some data off, and/or adjust the full ratio up (ceph pg set_full_ratio .95). Note that this becomes increasinly dangerous as OSDs get closer to full; add some disks. sage _________________________________________________ ceph-users mailing list ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> _________________________________________________ ceph-users mailing list ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> _________________________________________________ ceph-users mailing list ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 <tel:%2B31%20%280%2920%20700%209902> Skype: contact42on _________________________________________________ ceph-users mailing list ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
-- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com