Am Montag, 11. Juni 2012, 09:30:42 schrieb Sage Weil:
If you can reproduce it with 'debug filestore = 20' too, that will be
better, as it will tell us what the FIEMAP ioctl is returning.
I ran another testrun with 'debug filestore = 20'.
Also, if
you can attach/post the contents of the object itself (rados -p rbd get
rb.0.1.0000000002a0 /tmp/foo) we can make sure the object has the right
data (and the sparse-read operation that librbd is doing is the culprit).
I tried that, with the block name that the steps further below gave me:
rados -p rbd get rb.0.13.00000000045a block
When I looked into the block, it looked like a bunch of temp files from the
portage system with padding in between, although it should be random data... I
think I got the wrong block after all...
Here's what I did:
Run the iotester again:
testserver-rbd11 iotester # date ; ./iotester /var/iotest ; date
Tue Jun 12 13:51:58 CEST 2012
Wrote 100 MiB of data in 2004 milliseconds
[snip lots of irrelevant lines]
Wrote 100 MiB of data in 2537 milliseconds
Read 100 MiB of data in 3794 milliseconds
Read 100 MiB of data in 10150 milliseconds
Digest wrong for file "/var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e"
Tue Jun 12 13:55:00 CEST 2012
Run the fiemap tool on that file:
testserver-rbd11 ~ # ./fiemap
/var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e
File /var/iotest/4299a48eca63c75d6773bec3565190aa3b33c46e has 1 extents:
# Logical Physical Length Flags
0: 0000000000000000 0000000116900000 0000000000100000 0001
As for the log:
First, map the offset to an rbd block. For example, taking the 'Physical'
value of 00000000a8200000 from above:
$ printf "%012x\n" $((0x00000000a8200000 / (4096*1024) ))
0000000002a0
That gave me
$ printf "%012x\n" $((0x0000000116900000 / (4096*1024) ))
00000000045a
Then figure out what the object name prefix is:
$ rbd info<imagename> | grep prefix
block_name_prefix: rb.0.1
Result: block_name_prefix: rb.0.13
Then add the block number, 0000000002a0 to that, e.g. rb.0.1.0000000002a0.
Result: rb.0.13.00000000045a
Then map that back to an osd with
$ ceph osd map rbd rb.0.1.0000000002a0
osdmap e19 pool 'rbd' (2) object 'rb.0.1.0000000002a0' -> pg 2.a2e06f65
(2.5) -> up [0,2] acting [0,2]
That gives me
[root@storage1 ~]# ceph osd map rbd rb.0.13.00000000045a 2> /dev/null
osdmap e101 pool 'rbd' (2) object 'rb.0.13.00000000045a' -> pg 2.80b039fb
(2.7b) -> up [2,1] acting [2,1]
You'll see the osd ids listed in brackets after 'active'. We want the
first one, 0 in my example. The log from that OSD is what we need.
Okay, i'm attaching the compressed log for osd.2 and the compressed block to
the issue report in the redmine.
Guido