Robin,
The only two changesets I can spot in Jewel that I think might be
related are these:
1.
http://tracker.ceph.com/issues/20089
https://github.com/ceph/ceph/pull/15416
This should improve the repair functionality.
2.
http://tracker.ceph.com/issues/19404
https://github.com/ceph/ceph/pull/14204
This pull request fixes an issue that corrupted omaps. It also finds
and repairs them. However, the repair process might resurrect deleted
omaps which would show up as an omap digest error.
This could temporarily cause additional inconsistent PGs. So if this
has NOT been occurring longer than your deep-scrub interval since
upgrading, I'd repair the pgs and monitor going forward to make sure the
problem doesn't recur.
---------------
You have good example of repair scenarios:
.dir.default.292886573.13181.12 only has a omap_digest_mismatch and no
shard errors. The automatic repair won't be sure which is a good copy.
In this case we can see that osd 1327 doesn't match the other two. To
assist the repair process to repair the right one. Remove the copy on
osd.1327
Stop osd 1327 and use "ceph-objectstore-tool --data-path .....1327
.dir.default.292886573.13181.12 remove"
.dir.default.64449186.344176 has selected_object_info with "od 337cf025"
so shards have "omap_digest_mismatch_oi" except for osd 990.
The pg repair code will use osd.990 to fix the other 2 copies without
further handling.
David
On 9/8/17 11:16 AM, Robin H. Johnson wrote:
On Thu, Sep 07, 2017 at 08:24:04PM +0000, Robin H. Johnson wrote:
pg 5.3d40 is active+clean+inconsistent, acting [1322,990,655]
pg 5.f1c0 is active+clean+inconsistent, acting [631,1327,91]
Here is the output of 'rados list-inconsistent-obj' for the PGs:
$ sudo rados list-inconsistent-obj 5.f1c0 |json_pp -json_opt canonical,pretty
{
"epoch" : 1221254,
"inconsistents" : [
{
"errors" : [
"omap_digest_mismatch"
],
"object" : {
"locator" : "",
"name" : ".dir.default.292886573.13181.12",
"nspace" : "",
"snap" : "head",
"version" : 483490
},
"selected_object_info" :
"5:038f1cff:::.dir.default.292886573.13181.12:head(1221843'483490 client.417313345.0:19515832
dirty|omap|data_digest s 0 uv 483490 dd ffffffff alloc_hint [0 0])",
"shards" : [
{
"data_digest" : "0xffffffff",
"errors" : [],
"omap_digest" : "0x928b0c0b",
"osd" : 91,
"size" : 0
},
{
"data_digest" : "0xffffffff",
"errors" : [],
"omap_digest" : "0x928b0c0b",
"osd" : 631,
"size" : 0
},
{
"data_digest" : "0xffffffff",
"errors" : [],
"omap_digest" : "0x6556c868",
"osd" : 1327,
"size" : 0
}
],
"union_shard_errors" : []
}
]
}
$ sudo rados list-inconsistent-obj 5.3d40 |json_pp -json_opt canonical,pretty
{
"epoch" : 1210895,
"inconsistents" : [
{
"errors" : [
"omap_digest_mismatch"
],
"object" : {
"locator" : "",
"name" : ".dir.default.64449186.344176",
"nspace" : "",
"snap" : "head",
"version" : 1177199
},
"selected_object_info" :
"5:02bc4def:::.dir.default.64449186.344176:head(1177700'1180639 osd.1322.0:537914
dirty|omap|data_digest|omap_digest s 0 uv 1177199 dd ffffffff od 337cf025 alloc_hint [0 0])",
"shards" : [
{
"data_digest" : "0xffffffff",
"errors" : [
"omap_digest_mismatch_oi"
],
"omap_digest" : "0x3242b04e",
"osd" : 655,
"size" : 0
},
{
"data_digest" : "0xffffffff",
"errors" : [],
"omap_digest" : "0x337cf025",
"osd" : 990,
"size" : 0
},
{
"data_digest" : "0xffffffff",
"errors" : [
"omap_digest_mismatch_oi"
],
"omap_digest" : "0xc90d06a8",
"osd" : 1322,
"size" : 0
}
],
"union_shard_errors" : [
"omap_digest_mismatch_oi"
]
}
]
}
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com