Hello Ceph happy users. Starting this test I want to understand how Ceph can
protect my data and what I have to do in some situations.
So let's begin
== Preparation
ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
Ceph contains
MON: 3
OSD: 3
File system: ZFS
Kernel: 4.2.6
Preparing pool
# ceph osd pool create rbd 100
pool 'rbd' created
# ceph osd pool set rbd size 3
set pool 16 size to 3
RBD client
# rbd create test --size 4G
# rbd map test
/dev/rbd0
# mkfs.ext2 /dev/rbd0
# mount /dev/rbd0 /mnt
# printf "aaaaaaaaaa\nbbbbbbbbbb" > /mnt/file
Searching PG for our file
# grep "aaaaaaaaa" * -R
Binary file osd/nmz-0-journal/journal matches
Binary file
osd/nmz-1/current/16.22_head/rbd\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
matches
Binary file
osd/nmz-2/current/16.22_head/rbd\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
matches
Binary file osd/nmz-1-journal/journal matches
Binary file
osd/nmz-0/current/16.22_head/rbd\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
matches
Binary file osd/nmz-2-journal/journal matches
PG info
# ceph pg ls
pg_stat objects mip degr misp unf bytes log disklog state
state_stamp v reported up
up_primary acting acting_primary last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp
16.22 1 0 0 0 0 8192 2 2
active+clean 2016-02-19 08:46:11.157938 242'2 242:14 [2,1,0]
2 [2,1,0] 2 0'0 2016-02-19
08:45:38.006134 0'0 2016-02-19 08:45:38.006134
Primary PG is in osd.2. Lets do file checksum
# md5sum
osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\95818f285434d626ab26255410f9a447
osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
== Fail imitation #1
Lets corrupt backup PG
# sed -i -r 's/aaaaaaaaaa/abaaaaaaaa/g'
osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
# sed -i -r 's/aaaaaaaaaa/acaaaaaaaa/g'
osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
# md5sum
osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\99555c6c3ed07550b5fdfd2411b94fdd
osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\8cf7cc66d7f0dc7804fbfef492bcacfd
osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\95818f285434d626ab26255410f9a447
osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
lets do scrub to find the corruption
# ceph osd scrub 0
7f8732f33700 0 log_channel(cluster) log [INF] : 16.63 scrub starts
7f873072e700 0 log_channel(cluster) log [INF] : 16.63 scrub ok
....
7f8732732700 0 log_channel(cluster) log [INF] : 16.2d scrub starts
7f8734f37700 0 log_channel(cluster) log [INF] : 16.2d scrub ok
7f8730f2f700 0 log_channel(cluster) log [INF] : 16.2b scrub starts
7f8733734700 0 log_channel(cluster) log [INF] : 16.2b scrub ok
7f8731730700 0 log_channel(cluster) log [INF] : 16.2a scrub starts
7f8733f35700 0 log_channel(cluster) log [INF] : 16.2a scrub ok
7f8733f35700 0 log_channel(cluster) log [INF] : 16.25 scrub starts
7f8731730700 0 log_channel(cluster) log [INF] : 16.25 scrub ok
7f8733f35700 0 log_channel(cluster) log [INF] : 16.20 scrub starts
7f8731730700 0 log_channel(cluster) log [INF] : 16.20 scrub ok
....
7f8734f37700 0 log_channel(cluster) log [INF] : 16.0 scrub ok
scrub did not touch 16.22 PG. Same with osd.1
# ceph osd deep-scrub 0
same results. scrub vs deep-scrub google?
# ceph pg scrub 16.22
instructing pg 16.22 on osd.2 to scrub
Only primary PG is checking.
So I dont know how to make ceph to check all PG in OSD
== Fail imitation #2
Lets change others PG files. Lets make osd.0 to be fine and other corrupted
# sed -i -r 's/aaaaaaaaaa/adaaaaaaaa/g'
osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
# md5sum
osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\95818f285434d626ab26255410f9a447
osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\8cf7cc66d7f0dc7804fbfef492bcacfd
osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\852a51b44552ffbb2b0350966c9aa3b2
osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
# ceph osd scrub 2
osd.2 instructed to scrub
7f5e8b686700 0 log_channel(cluster) log [INF] : 16.22 scrub starts
7f5e88e81700 0 log_channel(cluster) log [INF] : 16.22 scrub ok
No error detection?
# ceph osd deep-scrub 2
osd.2 instructed to deep-scrub
7f5e88e81700 0 log_channel(cluster) log [INF] : 16.22 deep-scrub starts
7f5e8b686700 0 log_channel(cluster) log [INF] : 16.22 deep-scrub ok
Still no error detection? Lets check file with md5
# md5sum
osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\852a51b44552ffbb2b0350966c9aa3b2
osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
OSD use cache? Lets restart osd.2
-- After success restart
# ceph pg scrub 16.22
instructing pg 16.22 on osd.2 to scrub
7fc475e31700 0 log_channel(cluster) log [INF] : 16.22 scrub starts
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 shard 2: soid
16/a7e34aa2/rbd_data.1a72a39011461.0000000000000001/head missing attr _,
missing attr snapset
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 scrub 0 missing, 1
inconsistent objects
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 scrub 1 errors
# ceph -s
cluster 26fdb24b-9004-4e2b-a8d7-c28f45464084
health HEALTH_ERR
1 pgs inconsistent
1 scrub errors
monmap e7: 3 mons at
{a=10.10.8.1:6789/0,b=10.10.8.1:6790/0,c=10.10.8.1:6791/0}
election epoch 60, quorum 0,1,2 a,b,c
osdmap e250: 3 osds: 3 up, 3 in
flags sortbitwise
pgmap v3172: 100 pgs, 1 pools, 143 MB data, 67 objects
101 MB used, 81818 MB / 81920 MB avail
99 active+clean
1 active+clean+inconsistent
No auto health ?
# ceph pg repair 16.22
instructing pg 16.22 on osd.2 to repair
7fc475e31700 0 log_channel(cluster) log [INF] : 16.22 repair starts
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 shard 2: soid
16/a7e34aa2/rbd_data.1a72a39011461.0000000000000001/head data_digest 0xd444e973
!= known data_digest 0xb9b5bcf4 from auth shard 0, missing attr _, missing attr
snapset
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 repair 0 missing, 1
inconsistent objects
7fc478636700 -1 log_channel(cluster) log [ERR] : 16.22 repair 1 errors, 1 fixed
Lets do checksum
# md5sum
osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\95818f285434d626ab26255410f9a447
osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\8cf7cc66d7f0dc7804fbfef492bcacfd
osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\95818f285434d626ab26255410f9a447
osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
Primary PG is fixed but PG in osd.1 is left unchanged.
-- tunning
Lets change PG primary osd
# ceph tell mon.* injectargs -- --mon_osd_allow_primary_temp=true
mon.a: injectargs:mon_osd_allow_primary_temp = 'true'
mon.b: injectargs:mon_osd_allow_primary_temp = 'true'
mon.c: injectargs:mon_osd_allow_primary_temp = 'true'
# ceph osd primary-temp 16.22 1
set 16.22 primary_temp mapping to 1
# ceph osd scrub 1
osd.1 instructed to scrub
7f8a909a2700 0 log_channel(cluster) log [INF] : 16.22 scrub starts
7f8a931a7700 0 log_channel(cluster) log [INF] : 16.22 scrub ok
No detection
# ceph pg scrub 16.22
instructing pg 16.22 on osd.1 to scrub
7f8a931a7700 0 log_channel(cluster) log [INF] : 16.22 scrub starts
7f8a909a2700 0 log_channel(cluster) log [INF] : 16.22 scrub ok
Still nothing. Lets check md5
# md5sum
osd/nmz-*/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\95818f285434d626ab26255410f9a447
osd/nmz-0/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\8cf7cc66d7f0dc7804fbfef492bcacfd
osd/nmz-1/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
\852a51b44552ffbb2b0350966c9aa3b2
osd/nmz-2/current/16.22_head/rbd\\udata.1a72a39011461.0000000000000001__head_A7E34AA2__10
File is still corrupted.
So my questions are:
1. How to make full OSD scrub not part of it.
2. Why scrub do not detect corrupted files?
3. Does Ceph have auto heal option?
4. Does Ceph use some CRC mechanism to detect corrupted bit before return data?
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com