really sorry for the bad format, I will put it here again.
I found data lost when flattening a cloned image on giant(0.87.2). The problem
can be easily reproduced by runing the following script:
#!/bin/bash
ceph osd pool create wuxingyi 1 1
rbd create --image-format 2 wuxingyi/disk1.img --size 8
#writing "FOOBAR" at offset 0
python writetooffset.py disk1.img 0 FOOBAR
rbd snap create wuxingyi/disk1.img@SNAPSHOT
rbd snap protect wuxingyi/disk1.img@SNAPSHOT
echo "start cloing"
rbd clone wuxingyi/disk1.img@SNAPSHOT wuxingyi/CLONEIMAGE
#writing "WUXINGYI" at offset 4M of cloned image
python writetooffset.py CLONEIMAGE $((4*1048576)) WUXINGYI
rbd snap create wuxingyi/CLONEIMAGE@CLONEDSNAPSHOT
#modify at offset 4M of cloned image
python writetooffset.py CLONEIMAGE $((4*1048576)) HEHEHEHE
echo "start flattening CLONEIMAGE"
rbd flatten wuxingyi/CLONEIMAGE
echo "before rollback"
rbd export wuxingyi/CLONEIMAGE && hexdump -C CLONEIMAGE
rm CLONEIMAGE -f
rbd snap rollback wuxingyi/CLONEIMAGE@CLONEDSNAPSHOT
echo "after rollback"
rbd export wuxingyi/CLONEIMAGE && hexdump -C CLONEIMAGE
rm CLONEIMAGE -f
where writetooffset.py is a simple python script writing specific data to the
specific offset of the image:
#!/usr/bin/python
#coding=utf-8
import sys
import rbd
import rados
cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
cluster.connect()
ioctx = cluster.open_ioctx('wuxingyi')
rbd_inst = rbd.RBD()
image=rbd.Image(ioctx, sys.argv[1])
image.write(sys.argv[3], int(sys.argv[2]))
The output is something like:
before rollback
Exporting image: 100% complete...done.
00000000 46 4f 4f 42 41 52 00 00 00 00 00 00 00 00 00 00 |FOOBAR..........|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00400000 48 45 48 45 48 45 48 45 00 00 00 00 00 00 00 00 |HEHEHEHE........|
00400010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00800000
Rolling back to snapshot: 100% complete...done.
after rollback
Exporting image: 100% complete...done.
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00400000 57 55 58 49 4e 47 59 49 00 00 00 00 00 00 00 00 |WUXINGYI........|
00400010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00800000
We can easily fount that the first object of the image is definitely lost, and
I found the data loss is happened when flattening, there is only a "head"
version of the first object, actually a "snapid" version of the object should
also be created and writed when flattening.
But when running this scripts on upstream code, I cannot hit this problem. I
look through the upstream code but could not find which commit fixes this bug.
I also found the whole state machine dealing with RBD layering changed a lot
since giant release.
Could you please give me some hints on which commits should I backport?
Thanks~~~~
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com