The --debug indeed comes up with something 
bluestore(/var/lib/ceph/osd/ceph-12) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0x100ac314, expected 0x90407f75, device 
location [0x15a0170000~1000], logical extent 0x0~1000,
 bluestore(/var/lib/ceph/osd/ceph-9) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0xb40b26a7, expected 0x90407f75, device 
location [0x2daea0000~1000], logical extent 0x0~1000,

I dont know how to interpret this, but am I correct to understand that 
data has been written across the cluster to these 3 osd's and all 3 have 
somehow received something different?


size=4194304
object_info: 
17:6ca10b29:::rbd_data.1fff61238e1f29.0000000000009923:head(5387'35157 
client.2096993.0:78941 dirty|data_digest|omap_digest s 4194304 uv 35356 
dd f53dff2e od ffffffff alloc_hint [4194304 4194304 0])
data section offset=0 len=1048576
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
attrs size 2
omap map size 0
Read #17:6ca11ab9:::rbd_data.1fa8ef2ae8944a.00000000000011b4:head#
size=4194304
object_info: 
17:6ca11ab9:::rbd_data.1fa8ef2ae8944a.00000000000011b4:head(5163'7136 
client.2074638.1:483264 dirty|data_digest|omap_digest s 4194304 uv 7418 
dd 43d61c5d od ffffffff alloc_hint [4194304 4194304 0])
data section offset=0 len=1048576
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
attrs size 2
omap map size 0
Read #17:6ca13bed:::rbd_data.1f114174b0dc51.00000000000002c6:head#
size=4194304
object_info: 
17:6ca13bed:::rbd_data.1f114174b0dc51.00000000000002c6:head(5236'7640 
client.2074638.1:704364 dirty|data_digest|omap_digest s 4194304 uv 7922 
dd 3bcff64d od ffffffff alloc_hint [4194304 4194304 0])
data section offset=0 len=1048576
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
attrs size 2
omap map size 0
Read #17:6ca1a791:::rbd_data.1fff61238e1f29.000000000000f101:head#
size=4194304
object_info: 
17:6ca1a791:::rbd_data.1fff61238e1f29.000000000000f101:head(5387'35553 
client.2096993.0:123721 dirty|data_digest|omap_digest s 4194304 uv 35752 
dd f9bc0fbd od ffffffff alloc_hint [4194304 4194304 0])
data section offset=0 len=1048576
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
attrs size 2
omap map size 0
Read #17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4#
size=4194304
object_info: 
17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4(5390'56613 
client.2096907.1:3222443 dirty|omap_digest s 4194304 uv 55477 od 
ffffffff alloc_hint [0 0 0])
2017-08-08 15:57:45.078348 7fad08fa4100 -1 
bluestore(/var/lib/ceph/osd/ceph-12) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0x100ac314, expected 0x90407f75, device 
location [0x15a0170000~1000], logical extent 0x0~1000, object 
#17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4#
export_files error -5
2017-08-08 15:57:45.081279 7fad08fa4100  1 
bluestore(/var/lib/ceph/osd/ceph-12) umount
2017-08-08 15:57:45.150210 7fad08fa4100  1 freelist shutdown
2017-08-08 15:57:45.150307 7fad08fa4100  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
12.1.1/rpm/el7/BUILD/ceph-12.1.1/src/rocksdb/db/db_impl.cc:217] 
Shutdown: canceling all background work
2017-08-08 15:57:45.152099 7fad08fa4100  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
12.1.1/rpm/el7/BUILD/ceph-12.1.1/src/rocksdb/db/db_impl.cc:343] Shutdown 
complete
2017-08-08 15:57:45.184742 7fad08fa4100  1 bluefs umount
2017-08-08 15:57:45.203674 7fad08fa4100  1 bdev(0x7fad0b260e00 
/var/lib/ceph/osd/ceph-12/block) close
2017-08-08 15:57:45.442499 7fad08fa4100  1 bdev(0x7fad0b0a5a00 
/var/lib/ceph/osd/ceph-12/block) close

grep -i export_files strace.out -C 10

814  16:08:19.261144 futex(0x7fffea9378c0, FUTEX_WAKE_PRIVATE, 1) = 0 
<0.000010>
6814  16:08:19.261242 futex(0x7f4832bb60bc, FUTEX_WAKE_OP_PRIVATE, 1, 1, 
0x7f4832bb60b8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000012>
6814  16:08:19.261281 madvise(0x7f4843bf0000, 524288, MADV_DONTNEED 
<unfinished ...>
6815  16:08:19.261382 <... futex resumed> ) = 0 <14.990766>
6814  16:08:19.261412 <... madvise resumed> ) = 0 <0.000123>
6814  16:08:19.261446 madvise(0x7f4843b70000, 1048576, MADV_DONTNEED 
<unfinished ...>
6815  16:08:19.261474 futex(0x7f4832bb6038, FUTEX_WAKE_PRIVATE, 1 
<unfinished ...>
6814  16:08:19.261535 <... madvise resumed> ) = 0 <0.000067>
6815  16:08:19.261557 <... futex resumed> ) = 0 <0.000069>
6815  16:08:19.261647 futex(0x7f4832bb60bc, FUTEX_WAIT_PRIVATE, 45, NULL 
<unfinished ...>
6814  16:08:19.261700 write(2</dev/pts/0>, "export_files error ", 19) = 
19 <0.000024>
6814  16:08:19.261774 write(2</dev/pts/0>, "-5", 2) = 2 <0.000018>
6814  16:08:19.261841 write(2</dev/pts/0>, "\n", 1) = 1 <0.000016>
6814  16:08:19.262191 madvise(0x7f4839106000, 16384, MADV_DONTNEED) = 0 
<0.000015>
6814  16:08:19.262229 madvise(0x7f483914e000, 16384, MADV_DONTNEED) = 0 
<0.000012>
6814  16:08:19.262295 madvise(0x7f48389e6000, 49152, MADV_DONTNEED) = 0 
<0.000013>
6814  16:08:19.262498 madvise(0x7f48390ea000, 16384, MADV_DONTNEED) = 0 
<0.000013>
6814  16:08:19.262538 madvise(0x7f48390ce000, 16384, MADV_DONTNEED) = 0 
<0.000012>
6814  16:08:19.262580 madvise(0x7f483c228000, 24576, MADV_DONTNEED) = 0 
<0.000012>
6814  16:08:19.263047 madvise(0x7f48393d8000, 16384, MADV_DONTNEED) = 0 
<0.000013>
6814  16:08:19.263081 madvise(0x7f48393d8000, 32768, MADV_DONTNEED) = 0 
<0.000016>


I was curious how this would compare to the osd.9

object_info: 
17:6ca13bed:::rbd_data.1f114174b0dc51.00000000000002c6:head(5236'7640 
client.2074638.1:704364 dirty|data_digest|omap_digest s 4194304 uv 7922 
dd 3bcff64d od ffffffff alloc_hint [4194304 4194304 0])
data section offset=0 len=1048576
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
attrs size 2
omap map size 0
Read #17:6ca1a791:::rbd_data.1fff61238e1f29.000000000000f101:head#
size=4194304
object_info: 
17:6ca1a791:::rbd_data.1fff61238e1f29.000000000000f101:head(5387'35553 
client.2096993.0:123721 dirty|data_digest|omap_digest s 4194304 uv 35752 
dd f9bc0fbd od ffffffff alloc_hint [4194304 4194304 0])
data section offset=0 len=1048576
data section offset=1048576 len=1048576
data section offset=2097152 len=1048576
data section offset=3145728 len=1048576
attrs size 2
omap map size 0
Read #17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4#
size=4194304
object_info: 
17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4(5390'56613 
client.2096907.1:3222443 dirty|omap_digest s 4194304 uv 55477 od 
ffffffff alloc_hint [0 0 0])
2017-08-08 16:22:00.893216 7f94e10f5100 -1 
bluestore(/var/lib/ceph/osd/ceph-9) _verify_csum bad crc32c/0x1000 
checksum at blob offset 0x0, got 0xb40b26a7, expected 0x90407f75, device 
location [0x2daea0000~1000], logical extent 0x0~1000, object 
#17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4#
export_files error -5
2017-08-08 16:22:00.895439 7f94e10f5100  1 
bluestore(/var/lib/ceph/osd/ceph-9) umount
2017-08-08 16:22:00.963774 7f94e10f5100  1 freelist shutdown
2017-08-08 16:22:00.963861 7f94e10f5100  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
12.1.1/rpm/el7/BUILD/ceph-12.1.1/src/rocksdb/db/db_impl.cc:217] 
Shutdown: canceling all background work
2017-08-08 16:22:00.968438 7f94e10f5100  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_AR
CH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/
12.1.1/rpm/el7/BUILD/ceph-12.1.1/src/rocksdb/db/db_impl.cc:343] Shutdown 
complete
2017-08-08 16:22:00.984583 7f94e10f5100  1 bluefs umount
2017-08-08 16:22:01.026784 7f94e10f5100  1 bdev(0x7f94e3670e00 
/var/lib/ceph/osd/ceph-9/block) close
2017-08-08 16:22:01.243361 7f94e10f5100  1 bdev(0x7f94e34b5a00 
/var/lib/ceph/osd/ceph-9/block) close


23555 16:26:31.336061 io_getevents(139955679129600, 1, 16,  <unfinished 
...>
23552 16:26:31.336081 futex(0x7ffe7e4c9210, FUTEX_WAKE_PRIVATE, 1) = 0 
<0.000155>
23552 16:26:31.336452 futex(0x7f49fb4d20bc, FUTEX_WAKE_OP_PRIVATE, 1, 1, 
0x7f49fb4d20b8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000129>
23553 16:26:31.336637 <... futex resumed> ) = 0 <16.434259>
23553 16:26:31.336758 futex(0x7f49fb4d2038, FUTEX_WAKE_PRIVATE, 1 
<unfinished ...>
23552 16:26:31.336801 madvise(0x7f4a0cafa000, 2555904, MADV_DONTNEED 
<unfinished ...>
23553 16:26:31.336915 <... futex resumed> ) = 0 <0.000113>
23552 16:26:31.336959 <... madvise resumed> ) = 0 <0.000148>
23553 16:26:31.337040 futex(0x7f49fb4d20bc, FUTEX_WAIT_PRIVATE, 55, NULL 
<unfinished ...>
23552 16:26:31.337070 madvise(0x7f4a0ca7a000, 3080192, MADV_DONTNEED) = 
0 <0.000180>
23552 16:26:31.337424 write(2</dev/pts/1>, "export_files error ", 19) = 
19 <0.000104>
23552 16:26:31.337615 write(2</dev/pts/1>, "-5", 2) = 2 <0.000017>
23552 16:26:31.337674 write(2</dev/pts/1>, "\n", 1) = 1 <0.000037>
23552 16:26:31.338270 madvise(0x7f4a01ae4000, 16384, MADV_DONTNEED) = 0 
<0.000020>
23552 16:26:31.338320 madvise(0x7f4a018cc000, 49152, MADV_DONTNEED) = 0 
<0.000014>
23552 16:26:31.338561 madvise(0x7f4a0770a000, 24576, MADV_DONTNEED) = 0 
<0.000015>
23552 16:26:31.339161 madvise(0x7f4a02102000, 16384, MADV_DONTNEED) = 0 
<0.000015>
23552 16:26:31.339201 madvise(0x7f4a02132000, 16384, MADV_DONTNEED) = 0 
<0.000013>
23552 16:26:31.339235 madvise(0x7f4a02102000, 32768, MADV_DONTNEED) = 0 
<0.000014>
23552 16:26:31.339331 madvise(0x7f4a01df8000, 16384, MADV_DONTNEED) = 0 
<0.000019>
23552 16:26:31.339372 madvise(0x7f4a01df8000, 32768, MADV_DONTNEED) = 0 
<0.000013>


-----Original Message-----
From: Brad Hubbard [mailto:bhubb...@redhat.com] 
Sent: 07 August 2017 02:34
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] Pg inconsistent / export_files error -5



On Sat, Aug 5, 2017 at 1:21 AM, Marc Roos <m.r...@f1-outsourcing.eu> 
wrote:
>
> I have got a placement group inconsistency, and saw some manual where 
> you can export and import this on another osd. But I am getting an 
> export error on every osd.
>
> What does this export_files error -5 actually mean? I thought 3 copies

#define EIO              5      /* I/O error */

> should be enough to secure your data.
>
>
>> PG_DAMAGED Possible data damage: 1 pg inconsistent
>>    pg 17.36 is active+clean+inconsistent, acting [9,0,12]
>
>
>> 2017-08-04 05:39:51.534489 7f2f623d6700 -1 log_channel(cluster) log
> [ERR] : 17.36 soid
> 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4: failed to 
> pick suitable object info
>> 2017-08-04 05:41:12.715393 7f2f623d6700 -1 log_channel(cluster) log
> [ERR] : 17.36 deep-scrub 3 errors
>> 2017-08-04 15:21:12.445799 7f2f623d6700 -1 log_channel(cluster) log
> [ERR] : 17.36 soid
> 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4: failed to 
> pick suitable object info
>> 2017-08-04 15:22:35.646635 7f2f623d6700 -1 log_channel(cluster) log
> [ERR] : 17.36 repair 3 errors, 0 fixed
>
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-12 --pgid 
> 17.36 --op export --file /tmp/recover.17.36

Can you run this command under strace like so?

# strace -fvttyyTo /tmp/strace.out -s 1024 ceph-objectstore-tool 
--data-path /var/lib/ceph/osd/ceph-12 --pgid 17.36 --op export --file 
/tmp/recover.17.36

Then see if you can find which syscall is returning EIO.

# grep "= \-5" /tmp/strace.out

>
> ...
> Read #17:6c9f811c:::rbd_data.1b42f52ae8944a.0000000000001a32:head#
> Read #17:6ca035fc:::rbd_data.1fff61238e1f29.000000000000b31a:head#
> Read #17:6ca0b4f8:::rbd_data.1fff61238e1f29.0000000000006fcc:head#
> Read #17:6ca0ffbc:::rbd_data.1fff61238e1f29.000000000000a214:head#
> Read #17:6ca10b29:::rbd_data.1fff61238e1f29.0000000000009923:head#
> Read #17:6ca11ab9:::rbd_data.1fa8ef2ae8944a.00000000000011b4:head#
> Read #17:6ca13bed:::rbd_data.1f114174b0dc51.00000000000002c6:head#
> Read #17:6ca1a791:::rbd_data.1fff61238e1f29.000000000000f101:head#
> Read #17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4#
> export_files error -5

Running the command with "--debug" appended will give more output which 
may shed more light as well.

> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Cheers,
Brad


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to