[ceph-users] RBD pool disaster recovery backup...

Xavier Trilla Sat, 11 Jan 2014 12:08:39 -0800

Hi,

We have been playing for a while with Ceph (We want to use it as OpenStack 
storage backend) and I have to say that is a really nice piece of software :)


We are still on a design stage, but we plan to use it combined with Infiniband, 
SSDs  for caching and some other cool stuff. I will post more details once we 
setup the first production deployment.

But we are still having issues finding the best way to have an off-site backup 
for disaster recovery purposes.

So far what seems to make more sense is to use differential snapshots to 
transfer backups off-site (Building an script based on: 
http://ceph.com/dev-notes/incremental-snapshots-with-rbd/ ).

Our idea is to keep 4 incremental copies per month (1 per week) to a different 
location (bandwith is not a big issue as we will have a rented fiber that we 
will "illuminate" our-selves.).

So, if we decide to go for the previously stated solution we would keep 1 
snapshot per VM during the whole week, dumping the differences and rotating the 
snapshots every week.

Well, that seems to be a nice idea, but we found a couple of issues which 
prevent it from being the perfect solution:


-          What if a user builds a volume combining two or more RBD volumes 
with f.e. linux device mapper inside a VM? If we use the method stated before 
the volumes backup up to the DS copy would be out of sync.

-          OpenStack doesn't like you playing directly with RBD volumes. If you 
create a snapshot directly on ceph, OpenStack will not be aware of it, and f.e. 
volume delete volume operations will fail. And as for diff snapshots to work we 
would need to keep at least one snapshot always, delete operations would always 
fail).

Obviusly we found some solutions to these issues, I mean, for example we could 
modify OpenStack Cinder driver so it would remove all image snapshots before 
deleting the volume. But even if we do that, if OpenStack tries to delete a 
volumen while we are dumping the snapshot differences the volume delete 
operation will fail l, as the snapshots would be in use. Or to avoid data 
desincronitzation for data striped across several volumes, we could create all 
the snapshots first and then dump the differences, so the snapshots would be 
done almost at the same time, so data striped across volumes would be almost 
sincronized, but it's not an ideal solution.

So, as  these solutions are not ideal solutions, we have been checking other 
options like:


-          Pool Snapshots: Doesn't seem like much can be done with them, I 
mean, as far as I've seen the only option is to retrieve or store RADOS objects 
from pool snapshots. So doesn't seem to be a way to dump the whole pool to 
another Ceph Cluster, or even if it could be done (Maybe copying all the 
objects from one pool to the other) I don't know if it would work well with RBD 
pools.

-          Geo-Replication: I still need to upgrade our test cluster to play 
with this, but as far as I understood Geo-Replication is just for RADOSGW 
pools, so no way to use it for RBD pools.

But they don't seem to work as we would like... So, does anybody have any 
interesting idea we could use? Or is there any super amazing new feature coming 
soon for this? (We don't need to be production ready till 1st of september...)

Any help would be certainly apreciated :)

Thanks!

Saludos cordiales,
Xavier Trilla P.
Silicon Hosting<https://siliconhosting.com/>

¿Todavía no conoces Bare Metal Cloud?
¡La evolución de los Servidores VPS ya ha llegado!

más información en: siliconhosting.com/cloud<https://siliconhosting.com/cloud>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RBD pool disaster recovery backup...

Reply via email to