[ceph-users] Copy locked parent and clones to another pool

David Herselman Sun, 24 Dec 2017 13:14:36 -0800

Hi,

I created an erasure coded pool and thereafter created RBD images by specifying 
the '--data-pool' parameter. I subsequently created locked snapshots and cloned 
them for systems I was setting up. After finishing I realised that I hadn't 
specified the '-data-pool' parameter when creating the clones, damn! Any 
changes on the clones were being stored directly in the 'rbd_ssd' pool, instead 
of the erasure coded 'ec_ssd' pool...


There were 4 systems with 3 disks each so I, for each cloned drive, renamed it, 
created a new one (using the '--data-pool' switch this time) and then used some 
Perl that has been handy a whole bunch of times to only copy over 4MB chunks 
when the MD5 hash didn't match between the source and destination block devices.

This way the source and destination images are 100% identical and any blocks 
that match the original parent are skipped.


PS: It would be nice to retrieve the crc values for the object store blocks, as 
this would avoid reading the full images to calculate the MD5 sum per block...



for ID in 211 212 213 214; do
  for f in 1 2 3; do
    rbd mv rbd_ssd/vm-$ID-disk-$f rbd_ssd/original-$ID-disk-$f;
    rbd clone rbd_ssd/base-210-disk-"$f"@__base__ rbd_ssd/vm-$ID-disk-"$f" 
--data-pool ec_ssd;
  done
done
rbd resize rbd_ssd/vm-213-disk-3 --size 50G;
rbd resize rbd_ssd/vm-214-disk-3 --size 1T;
for ID in 211 212 213 214; do
  for f in 1 2 3; do
    export dev1=`rbd map rbd_ssd/original-$ID-disk-$f --name client.admin -k 
/etc/pve/priv/ceph.client.admin.keyring`;
    export dev2=`rbd map rbd_ssd/vm-$ID-disk-$f --name client.admin -k 
/etc/pve/priv/ceph.client.admin.keyring`;
    perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};print md5($_)' $dev2 |
      perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};$b=md5($_);
        read STDIN,$a,16;if ($a eq $b) {print "s"} else {print "c" . $_}' $dev1 
|
          perl -ne 'BEGIN{$/=\1} if ($_ eq"s") {$s++} else {if ($s) {
            seek STDOUT,$s*4194304,1; $s=0}; read ARGV,$buf,4194304; print 
$buf}' 1<> $dev2;
    rbd unmap $dev1;
    rbd unmap $dev2;
  done
done


# Compare amount of used space:
for ID in 211 212 213 172; do
  for f in 1 2 3; do
    echo -e "\nNAME                 PROVISIONED      USED";
    rbd du rbd_ssd/original-$ID-disk-"$f" 2> /dev/null | grep -P 
"^\S+disk-$f\s" | while read n a u; do printf "%-22s %9s %9s\n" $n $a $u; done;
    rbd du rbd_ssd/vm-$ID-disk-"$f" 2> /dev/null | grep -P "^\S+disk-$f\s" | 
while read n a u; do printf "%-22s %9s %9s\n" $n $a $u; done;
  done
done

Sample output:
NAME                 PROVISIONED      USED
original-211-disk-1        4400M    28672k
vm-211-disk-1              4400M    28672k

NAME                 PROVISIONED      USED
original-211-disk-2       30720M     6312M
vm-211-disk-2             30720M     6300M

NAME                 PROVISIONED      USED
original-211-disk-3       20480M     2092M
vm-211-disk-3             20480M     2088M


vm-211-disk-3 uses 4MB less data than original-211-disk-3 but validating the 
content of the images confirms that they are identical:

ID=211;
f=3;
export dev1=`rbd map rbd_ssd/original-$ID-disk-$f --name client.admin -k 
/etc/pve/priv/ceph.client.admin.keyring`;
export dev2=`rbd map rbd_ssd/vm-$ID-disk-$f --name client.admin -k 
/etc/pve/priv/ceph.client.admin.keyring`;
dd if=$dev1 bs=128M 2> /dev/null | sha1sum;
dd if=$dev2 bs=128M 2> /dev/null | sha1sum;
rbd unmap $dev1;
rbd unmap $dev2;

Output:
979ab34ea645ef6f16c3dbb5d3a78152018ea8e7  -
979ab34ea645ef6f16c3dbb5d3a78152018ea8e7  -


PS: qemu-img runs much faster than the perl nightmare above, as it knows which 
blocks contain data, BUT it copies data every time so using it with snapshot 
rotations results in each snapshot being the full source image data size. The 
perl method results in reading overhead (Ceph does however feed it zeros for 
unallocated blocks, which aren't actually read from anywhere) so it's much 
slower than qemu-img but exclusively copies blocks which are different.



The following may also be useful to others. It's a relatively simple script to 
use the Perl method above to backup images from one pool to another. The script 
could easily be tweaked to use LVM snapshots as a destination and the method is 
compatible with any block device.

Notes:
  We have rbd_ssd/base-210-disk-X as a protected snapshot (clone parent) and 
then have 4 children where each VM has 3 disks. You would need to create the 
destination images and ensure that their size matches the source images as a 
prerequisite. The following script rotates 3 snapshots each time it runs and 
additionally creates a snapshot of the source images (not the static clone 
parent) before comparing block devices:


#!/bin/sh

src='rbd_ssd';
dst='rbd_hdd';

rbdsnap () {
  [ "x" = "$1"x ] && return 1;
  [ `rbd snap ls $1 | grep -Pc "^\s+\d+\s+$2\s"` -gt 0 ] && return 0 || return 
1;
}

# Backup 'template-debian-9.3' (clone parent) - Should never change so no need 
to maintain snapshots or run it on a continual basis:
#for ID in 210; do
#  for f in 1 2 3; do
#    echo -en "\t\t : Copying "$src"/base-"$ID"-disk-"$f"@__base__ to 
"$dst"/vm-"$ID"-disk-"$f"_backup";
#    qemu-img convert -f raw -O raw -t unsafe -T unsafe -nWp -S 4M 
rbd:"$src"/base-"$ID"-disk-"$f"@__base__ rbd:"$dst"/vm-"$ID"-disk-"$f"_backup;
#  done
#done

# Backup images (clone children):
for ID in 211 212 213 214; do
  for f in 1 2 3; do
    rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap3 && rbdsnap 
"$dst"/vm-"$ID"-disk-"$f"_backup snap2 && rbd snap rm 
"$dst"/vm-"$ID"-disk-"$f"_backup@snap3;
    rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap3 || rbdsnap 
"$dst"/vm-"$ID"-disk-"$f"_backup snap2 && rbd snap rename 
"$dst"/vm-"$ID"-disk-"$f"_backup@snap2 "$dst"/vm-"$ID"-disk-"$f"_backup@snap3;
    rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap2 || rbdsnap 
"$dst"/vm-"$ID"-disk-"$f"_backup snap1 && rbd snap rename 
"$dst"/vm-"$ID"-disk-"$f"_backup@snap1 "$dst"/vm-"$ID"-disk-"$f"_backup@snap2;
    rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap1 || rbd snap create 
"$dst"/vm-"$ID"-disk-"$f"_backup@snap1;
    rbd snap create "$src"/vm-"$ID"-disk-"$f"@backupinprogress;
  done
  for f in 1 2 3; do
    echo -en "\t\t : Copying "$src"/vm-"$ID"-disk-"$f" to 
"$dst"/vm-"$ID"-disk-"$f"_backup";
    #qemu-img convert -f raw -O raw -t unsafe -T unsafe -nWp -S 4M 
rbd:"$src"/vm-"$ID"-disk-"$f"@backupinprogress 
rbd:"$dst"/vm-"$ID"-disk-"$f"_backup;
    export dev1=`rbd map "$src"/vm-"$ID"-disk-"$f@backupinprogress" --name 
client.admin -k /etc/pve/priv/ceph.client.admin.keyring`;
    export dev2=`rbd map "$dst"/vm-"$ID"-disk-"$f"_backup --name client.admin 
-k /etc/pve/priv/ceph.client.admin.keyring`;
    perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};print md5($_)' $dev2 |
      perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};$b=md5($_);
        read STDIN,$a,16;if ($a eq $b) {print "s"} else {print "c" . $_}' $dev1 
|
          perl -ne 'BEGIN{$/=\1} if ($_ eq"s") {$s++} else {if ($s) {
            seek STDOUT,$s*4194304,1; $s=0}; read ARGV,$buf,4194304; print 
$buf}' 1<> $dev2;
    rbd unmap $dev1;
    rbd unmap $dev2;
    rbd snap rm "$src"/vm-"$ID"-disk-"$f"@backupinprogress;
  done
done



Commenting out everything from 'export dev1' to 'rbd unmap $dev2' and 
uncommenting out the qemu-img command yields the following:
  real    0m48.598s
  user    0m14.583s
  sys     0m10.986s
[admin@kvm5a ~]# rbd du rbd_hdd/vm-211-disk-3_backup
NAME                       PROVISIONED   USED
vm-211-disk-3_backup@snap3      20480M  2764M
vm-211-disk-3_backup@snap2      20480M  2764M
vm-211-disk-3_backup@snap1      20480M  2764M
vm-211-disk-3_backup            20480M  2764M
<TOTAL>                         20480M 11056M


Repeating the copy using the Perl solution is much slower but as the VM is 
currently off nothing has changed and each snapshot consumes zero data:
  real    1m49.000s
  user    1m34.339s
 sys     0m17.847s
[admin@kvm5a ~]# rbd du rbd_hdd/vm-211-disk-3_backup
warning: fast-diff map is not enabled for vm-211-disk-3_backup. operation may 
be slow.
NAME                       PROVISIONED  USED
vm-211-disk-3_backup@snap3      20480M 2764M
vm-211-disk-3_backup@snap2      20480M     0
vm-211-disk-3_backup@snap1      20480M     0
vm-211-disk-3_backup            20480M     0
<TOTAL>                         20480M 2764M


PS: Not if this that is a Ceph display bug, why would the snapshot base be 
reported as not consuming any data and the first snapshot (rotated to 'snap3') 
report all the usage? Purging all snapshots yields the following:
[admin@kvm5a ~]# rbd du rbd_hdd/vm-211-disk-3_backup
warning: fast-diff map is not enabled for vm-211-disk-3_backup. operation may 
be slow.
NAME                 PROVISIONED  USED
vm-211-disk-3_backup      20480M 2764M


Regards
David Herselman

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Copy locked parent and clones to another pool

Reply via email to