Hi,
I created an erasure coded pool and thereafter created RBD images by specifying
the '--data-pool' parameter. I subsequently created locked snapshots and cloned
them for systems I was setting up. After finishing I realised that I hadn't
specified the '-data-pool' parameter when creating the clones, damn! Any
changes on the clones were being stored directly in the 'rbd_ssd' pool, instead
of the erasure coded 'ec_ssd' pool...
There were 4 systems with 3 disks each so I, for each cloned drive, renamed it,
created a new one (using the '--data-pool' switch this time) and then used some
Perl that has been handy a whole bunch of times to only copy over 4MB chunks
when the MD5 hash didn't match between the source and destination block devices.
This way the source and destination images are 100% identical and any blocks
that match the original parent are skipped.
PS: It would be nice to retrieve the crc values for the object store blocks, as
this would avoid reading the full images to calculate the MD5 sum per block...
for ID in 211 212 213 214; do
for f in 1 2 3; do
rbd mv rbd_ssd/vm-$ID-disk-$f rbd_ssd/original-$ID-disk-$f;
rbd clone rbd_ssd/base-210-disk-"$f"@__base__ rbd_ssd/vm-$ID-disk-"$f"
--data-pool ec_ssd;
done
done
rbd resize rbd_ssd/vm-213-disk-3 --size 50G;
rbd resize rbd_ssd/vm-214-disk-3 --size 1T;
for ID in 211 212 213 214; do
for f in 1 2 3; do
export dev1=`rbd map rbd_ssd/original-$ID-disk-$f --name client.admin -k
/etc/pve/priv/ceph.client.admin.keyring`;
export dev2=`rbd map rbd_ssd/vm-$ID-disk-$f --name client.admin -k
/etc/pve/priv/ceph.client.admin.keyring`;
perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};print md5($_)' $dev2 |
perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};$b=md5($_);
read STDIN,$a,16;if ($a eq $b) {print "s"} else {print "c" . $_}' $dev1
|
perl -ne 'BEGIN{$/=\1} if ($_ eq"s") {$s++} else {if ($s) {
seek STDOUT,$s*4194304,1; $s=0}; read ARGV,$buf,4194304; print
$buf}' 1<> $dev2;
rbd unmap $dev1;
rbd unmap $dev2;
done
done
# Compare amount of used space:
for ID in 211 212 213 172; do
for f in 1 2 3; do
echo -e "\nNAME PROVISIONED USED";
rbd du rbd_ssd/original-$ID-disk-"$f" 2> /dev/null | grep -P
"^\S+disk-$f\s" | while read n a u; do printf "%-22s %9s %9s\n" $n $a $u; done;
rbd du rbd_ssd/vm-$ID-disk-"$f" 2> /dev/null | grep -P "^\S+disk-$f\s" |
while read n a u; do printf "%-22s %9s %9s\n" $n $a $u; done;
done
done
Sample output:
NAME PROVISIONED USED
original-211-disk-1 4400M 28672k
vm-211-disk-1 4400M 28672k
NAME PROVISIONED USED
original-211-disk-2 30720M 6312M
vm-211-disk-2 30720M 6300M
NAME PROVISIONED USED
original-211-disk-3 20480M 2092M
vm-211-disk-3 20480M 2088M
vm-211-disk-3 uses 4MB less data than original-211-disk-3 but validating the
content of the images confirms that they are identical:
ID=211;
f=3;
export dev1=`rbd map rbd_ssd/original-$ID-disk-$f --name client.admin -k
/etc/pve/priv/ceph.client.admin.keyring`;
export dev2=`rbd map rbd_ssd/vm-$ID-disk-$f --name client.admin -k
/etc/pve/priv/ceph.client.admin.keyring`;
dd if=$dev1 bs=128M 2> /dev/null | sha1sum;
dd if=$dev2 bs=128M 2> /dev/null | sha1sum;
rbd unmap $dev1;
rbd unmap $dev2;
Output:
979ab34ea645ef6f16c3dbb5d3a78152018ea8e7 -
979ab34ea645ef6f16c3dbb5d3a78152018ea8e7 -
PS: qemu-img runs much faster than the perl nightmare above, as it knows which
blocks contain data, BUT it copies data every time so using it with snapshot
rotations results in each snapshot being the full source image data size. The
perl method results in reading overhead (Ceph does however feed it zeros for
unallocated blocks, which aren't actually read from anywhere) so it's much
slower than qemu-img but exclusively copies blocks which are different.
The following may also be useful to others. It's a relatively simple script to
use the Perl method above to backup images from one pool to another. The script
could easily be tweaked to use LVM snapshots as a destination and the method is
compatible with any block device.
Notes:
We have rbd_ssd/base-210-disk-X as a protected snapshot (clone parent) and
then have 4 children where each VM has 3 disks. You would need to create the
destination images and ensure that their size matches the source images as a
prerequisite. The following script rotates 3 snapshots each time it runs and
additionally creates a snapshot of the source images (not the static clone
parent) before comparing block devices:
#!/bin/sh
src='rbd_ssd';
dst='rbd_hdd';
rbdsnap () {
[ "x" = "$1"x ] && return 1;
[ `rbd snap ls $1 | grep -Pc "^\s+\d+\s+$2\s"` -gt 0 ] && return 0 || return
1;
}
# Backup 'template-debian-9.3' (clone parent) - Should never change so no need
to maintain snapshots or run it on a continual basis:
#for ID in 210; do
# for f in 1 2 3; do
# echo -en "\t\t : Copying "$src"/base-"$ID"-disk-"$f"@__base__ to
"$dst"/vm-"$ID"-disk-"$f"_backup";
# qemu-img convert -f raw -O raw -t unsafe -T unsafe -nWp -S 4M
rbd:"$src"/base-"$ID"-disk-"$f"@__base__ rbd:"$dst"/vm-"$ID"-disk-"$f"_backup;
# done
#done
# Backup images (clone children):
for ID in 211 212 213 214; do
for f in 1 2 3; do
rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap3 && rbdsnap
"$dst"/vm-"$ID"-disk-"$f"_backup snap2 && rbd snap rm
"$dst"/vm-"$ID"-disk-"$f"_backup@snap3;
rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap3 || rbdsnap
"$dst"/vm-"$ID"-disk-"$f"_backup snap2 && rbd snap rename
"$dst"/vm-"$ID"-disk-"$f"_backup@snap2 "$dst"/vm-"$ID"-disk-"$f"_backup@snap3;
rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap2 || rbdsnap
"$dst"/vm-"$ID"-disk-"$f"_backup snap1 && rbd snap rename
"$dst"/vm-"$ID"-disk-"$f"_backup@snap1 "$dst"/vm-"$ID"-disk-"$f"_backup@snap2;
rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap1 || rbd snap create
"$dst"/vm-"$ID"-disk-"$f"_backup@snap1;
rbd snap create "$src"/vm-"$ID"-disk-"$f"@backupinprogress;
done
for f in 1 2 3; do
echo -en "\t\t : Copying "$src"/vm-"$ID"-disk-"$f" to
"$dst"/vm-"$ID"-disk-"$f"_backup";
#qemu-img convert -f raw -O raw -t unsafe -T unsafe -nWp -S 4M
rbd:"$src"/vm-"$ID"-disk-"$f"@backupinprogress
rbd:"$dst"/vm-"$ID"-disk-"$f"_backup;
export dev1=`rbd map "$src"/vm-"$ID"-disk-"$f@backupinprogress" --name
client.admin -k /etc/pve/priv/ceph.client.admin.keyring`;
export dev2=`rbd map "$dst"/vm-"$ID"-disk-"$f"_backup --name client.admin
-k /etc/pve/priv/ceph.client.admin.keyring`;
perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};print md5($_)' $dev2 |
perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};$b=md5($_);
read STDIN,$a,16;if ($a eq $b) {print "s"} else {print "c" . $_}' $dev1
|
perl -ne 'BEGIN{$/=\1} if ($_ eq"s") {$s++} else {if ($s) {
seek STDOUT,$s*4194304,1; $s=0}; read ARGV,$buf,4194304; print
$buf}' 1<> $dev2;
rbd unmap $dev1;
rbd unmap $dev2;
rbd snap rm "$src"/vm-"$ID"-disk-"$f"@backupinprogress;
done
done
Commenting out everything from 'export dev1' to 'rbd unmap $dev2' and
uncommenting out the qemu-img command yields the following:
real 0m48.598s
user 0m14.583s
sys 0m10.986s
[admin@kvm5a ~]# rbd du rbd_hdd/vm-211-disk-3_backup
NAME PROVISIONED USED
vm-211-disk-3_backup@snap3 20480M 2764M
vm-211-disk-3_backup@snap2 20480M 2764M
vm-211-disk-3_backup@snap1 20480M 2764M
vm-211-disk-3_backup 20480M 2764M
<TOTAL> 20480M 11056M
Repeating the copy using the Perl solution is much slower but as the VM is
currently off nothing has changed and each snapshot consumes zero data:
real 1m49.000s
user 1m34.339s
sys 0m17.847s
[admin@kvm5a ~]# rbd du rbd_hdd/vm-211-disk-3_backup
warning: fast-diff map is not enabled for vm-211-disk-3_backup. operation may
be slow.
NAME PROVISIONED USED
vm-211-disk-3_backup@snap3 20480M 2764M
vm-211-disk-3_backup@snap2 20480M 0
vm-211-disk-3_backup@snap1 20480M 0
vm-211-disk-3_backup 20480M 0
<TOTAL> 20480M 2764M
PS: Not if this that is a Ceph display bug, why would the snapshot base be
reported as not consuming any data and the first snapshot (rotated to 'snap3')
report all the usage? Purging all snapshots yields the following:
[admin@kvm5a ~]# rbd du rbd_hdd/vm-211-disk-3_backup
warning: fast-diff map is not enabled for vm-211-disk-3_backup. operation may
be slow.
NAME PROVISIONED USED
vm-211-disk-3_backup 20480M 2764M
Regards
David Herselman
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com