Hello Reed,

I would give PG repair a try.
IIRC there should be issue when you have Size 3... it would be difficult when 
you have Size 2 I guess...

Hth
Mehmet

Am 29. April 2019 17:05:48 MESZ schrieb Reed Dier <[email protected]>:
>Hi list,
>
>Woke up this morning to two PG's reporting scrub errors, in a way that
>I haven't seen before.
>> $ ceph versions
>> {
>>     "mon": {
>>         "ceph version 13.2.5
>(cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)": 3
>>     },
>>     "mgr": {
>>         "ceph version 13.2.5
>(cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)": 3
>>     },
>>     "osd": {
>>         "ceph version 13.2.4
>(b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)": 156
>>     },
>>     "mds": {
>>         "ceph version 13.2.5
>(cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)": 2
>>     },
>>     "overall": {
>>         "ceph version 13.2.4
>(b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)": 156,
>>         "ceph version 13.2.5
>(cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)": 8
>>     }
>> }
>
>
>> OSD_SCRUB_ERRORS 8 scrub errors
>> PG_DAMAGED Possible data damage: 2 pgs inconsistent
>>     pg 17.72 is active+clean+inconsistent, acting [3,7,153]
>>     pg 17.2b9 is active+clean+inconsistent, acting [19,7,16]
>
>Here is what $rados list-inconsistent-obj 17.2b9 --format=json-pretty
>yields:
>> {
>>     "epoch": 134582,
>>     "inconsistents": [
>>         {
>>             "object": {
>>                 "name": "10008536718.00000000",
>>                 "nspace": "",
>>                 "locator": "",
>>                 "snap": "head",
>>                 "version": 0
>>             },
>>             "errors": [],
>>             "union_shard_errors": [
>>                 "obj_size_info_mismatch"
>>             ],
>>             "shards": [
>>                 {
>>                     "osd": 7,
>>                     "primary": false,
>>                     "errors": [
>>                         "obj_size_info_mismatch"
>>                     ],
>>                     "size": 5883,
>>                     "object_info": {
>>                         "oid": {
>>                             "oid": "10008536718.00000000",
>>                             "key": "",
>>                             "snapid": -2,
>>                             "hash": 1752643257,
>>                             "max": 0,
>>                             "pool": 17,
>>                             "namespace": ""
>>                         },
>>                         "version": "134599'448331",
>>                         "prior_version": "134599'448330",
>>                         "last_reqid": "client.1580931080.0:671854",
>>                         "user_version": 448331,
>>                         "size": 3505,
>>                         "mtime": "2019-04-28 15:32:20.003519",
>>                         "local_mtime": "2019-04-28 15:32:25.991015",
>>                         "lost": 0,
>>                         "flags": [
>>                             "dirty",
>>                             "data_digest",
>>                             "omap_digest"
>>                         ],
>>                         "truncate_seq": 899,
>>                         "truncate_size": 0,
>>                         "data_digest": "0xf99a3bd3",
>>                         "omap_digest": "0xffffffff",
>>                         "expected_object_size": 0,
>>                         "expected_write_size": 0,
>>                         "alloc_hint_flags": 0,
>>                         "manifest": {
>>                             "type": 0
>>                         },
>>                         "watchers": {}
>>                     }
>>                 },
>>                 {
>>                     "osd": 16,
>>                     "primary": false,
>>                     "errors": [
>>                         "obj_size_info_mismatch"
>>                     ],
>>                     "size": 5883,
>>                     "object_info": {
>>                         "oid": {
>>                             "oid": "10008536718.00000000",
>>                             "key": "",
>>                             "snapid": -2,
>>                             "hash": 1752643257,
>>                             "max": 0,
>>                             "pool": 17,
>>                             "namespace": ""
>>                         },
>>                         "version": "134599'448331",
>>                         "prior_version": "134599'448330",
>>                         "last_reqid": "client.1580931080.0:671854",
>>                         "user_version": 448331,
>>                         "size": 3505,
>>                         "mtime": "2019-04-28 15:32:20.003519",
>>                         "local_mtime": "2019-04-28 15:32:25.991015",
>>                         "lost": 0,
>>                         "flags": [
>>                             "dirty",
>>                             "data_digest",
>>                             "omap_digest"
>>                         ],
>>                         "truncate_seq": 899,
>>                         "truncate_size": 0,
>>                         "data_digest": "0xf99a3bd3",
>>                         "omap_digest": "0xffffffff",
>>                         "expected_object_size": 0,
>>                         "expected_write_size": 0,
>>                         "alloc_hint_flags": 0,
>>                         "manifest": {
>>                             "type": 0
>>                         },
>>                         "watchers": {}
>>                     }
>>                 },
>>                 {
>>                     "osd": 19,
>>                     "primary": true,
>>                     "errors": [
>>                         "obj_size_info_mismatch"
>>                     ],
>>                     "size": 5883,
>>                     "object_info": {
>>                         "oid": {
>>                             "oid": "10008536718.00000000",
>>                             "key": "",
>>                             "snapid": -2,
>>                             "hash": 1752643257,
>>                             "max": 0,
>>                             "pool": 17,
>>                             "namespace": ""
>>                         },
>>                         "version": "134599'448331",
>>                         "prior_version": "134599'448330",
>>                         "last_reqid": "client.1580931080.0:671854",
>>                         "user_version": 448331,
>>                         "size": 3505,
>>                         "mtime": "2019-04-28 15:32:20.003519",
>>                         "local_mtime": "2019-04-28 15:32:25.991015",
>>                         "lost": 0,
>>                         "flags": [
>>                             "dirty",
>>                             "data_digest",
>>                             "omap_digest"
>>                         ],
>>                         "truncate_seq": 899,
>>                         "truncate_size": 0,
>>                         "data_digest": "0xf99a3bd3",
>>                         "omap_digest": "0xffffffff",
>>                         "expected_object_size": 0,
>>                         "expected_write_size": 0,
>>                         "alloc_hint_flags": 0,
>>                         "manifest": {
>>                             "type": 0
>>                         },
>>                         "watchers": {}
>>                     }
>>                 }
>>             ]
>>         }
>>     ]
>> }
>
>To snip that down to the parts that appear to matter:
>>      "errors": [],
>>         "union_shard_errors": [
>>             "obj_size_info_mismatch"
>>             ],
>>             "shards": [
>>                 {
>>                     "errors": [
>>                         "obj_size_info_mismatch"
>>                     ],
>>                     "size": 5883,
>>                     "object_info": {
>>                        "size": 3505, }
>
>It looks like the size info, does in fact mismatch (5883 != 3505).
>
>So I attempted a deep-scrub again, and the issue persists across both
>PG's.
>> 2019-04-29 09:08:27.729 7fe4f5bee700  0 log_channel(cluster) log
>[DBG] : 17.2b9 deep-scrub starts
>> 2019-04-29 09:22:53.363 7fe4f5bee700 -1 log_channel(cluster) log
>[ERR] : 17.2b9 shard 19 soid 17:9d6cee
>> 16:::10008536718.00000000:head : candidate size 5883 info size 3505
>mismatch
>> 2019-04-29 09:22:53.363 7fe4f5bee700 -1 log_channel(cluster) log
>[ERR] : 17.2b9 shard 7 soid 17:9d6cee1
>> 6:::10008536718.00000000:head : candidate size 5883 info size 3505
>mismatch
>> 2019-04-29 09:22:53.363 7fe4f5bee700 -1 log_channel(cluster) log
>[ERR] : 17.2b9 shard 16 soid 17:9d6cee
>> 16:::10008536718.00000000:head : candidate size 5883 info size 3505
>mismatch
>> 2019-04-29 09:22:53.363 7fe4f5bee700 -1 log_channel(cluster) log
>[ERR] : 17.2b9 soid 17:9d6cee16:::1000
>> 8536718.00000000:head : failed to pick suitable object info
>> 2019-04-29 09:22:53.363 7fe4f5bee700 -1 log_channel(cluster) log
>[ERR] : deep-scrub 17.2b9 17:9d6cee16:
>> ::10008536718.00000000:head : on disk size (5883) does not match
>object info size (3505) adjusted for o
>> ndisk to (3505)
>> 2019-04-29 09:27:46.840 7fe4f5bee700 -1 log_channel(cluster) log
>[ERR] : 17.2b9 deep-scrub 4 errors
>
>Pool 17 is a cephfs data pool, if that makes any difference.
>And the two MDS's listed in versions are active:standby, not
>active:active.
>
>My question is whether I should attempt a `ceph pg repair <pgid>` to
>attempt a fix of these objects, or take another approach, as the object
>size mismatch appears to persist across all 3 copies of the PG(s).
>I know that ceph pg repair can be dangerous in certain circumstances,
>so I want to feel confident in the operation before undertaking the
>repair.
>
>I did look at all underlying disks for these PG's for issues or errors,
>and none bubbled to the top, so I don't believe it to be a hardware
>issue in this case.
>
>Appreciate any help.
>
>Thanks,
>
>Reed
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to