[ceph-users] Recovery after datacenter outage

Christian Zunker Fri, 22 Jun 2018 02:26:16 -0700

Hi List,

we are running a ceph cluster (12.2.5) as backend to our OpenStack cloud.


Yesterday our datacenter had a power outage. As this wouldn't be enough, we
also had a separated ceph cluster because of networking problems.

First of all thanks a lot to the ceph developers. After the network was
back to normal, ceph recovered itself. You saved us from a lot of downtime,
lack of sleep and insanity.

Now to our problem/question:
After ceph recovered, we tried to bring up our VMs. They have cinder
volumes saved in ceph. All VMs didn't start because of I/O problems during
start:
[    4.393246] JBD2: recovery failed
[    4.395949] EXT4-fs (vda1): error loading journal
[    4.400811] VFS: Dirty inode writeback failed for block device vda1
(err=-5).
mount: mounting /dev/vda1 on /root failed: Input/output error
done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... mount: mounting /dev on /root/dev
failed: No such file or directory

We tried to recover the disk with different methods, but all failed because
of different reasons. What helped us at the end was a rebuild on the object
map of each image:
rbd object-map rebuild volumes/<uuid>

>From what we understood, object-map is a feature for ceph internal speedup.
How can this lead to I/O errors in our VMs?
Is this the expected way for a recovery?
Did we miss something?
Is there any documentation describing what leads to invalid object-maps and
how to recover? (We did not find a doc on that topic...)


regards
Christian

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Recovery after datacenter outage

Reply via email to