Good Morning,
I have an odd situation where a pg is listed inconsistent, but rados is
struggling to tell me about it:
# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 requests are blocked > 32 sec; 1 osds have
slow requests; 1 scrub errors
pg 22.1611 is active+clean+inconsistent, acting
[294,1080,970,324,722,70,949,874,943,606,518]
1 scrub errors
# rados list-inconsistent-pg .us-smr.rgw.buckets
["22.1611"]
# rados list-inconsistent-obj 22.1611
[]error 2: (2) No such file or directory
A little background, I got into this state because the inconsistent pg popped
up in ceph -s. I used list-inconsistent-obj to find which osd was causing the
problem:
{
"osd": 497,
"missing": false,
"read_error": true,
"data_digest_mismatch": false,
"omap_digest_mismatch": false,
"size_mismatch": false,
"size": 599488
},
Because it was a read error I check SMART stats for that osd's disk and sure
enough, it had some uncorrected read errors. In order to stop it from causing
more problems I stopped the daemon to let ceph recover from the other osds. The
cluster has now finished rebalancing, but remains in ERR state as it still
thinks this pg is inconsistent.
ceph pg query output is here: https://hastebin.com/mamesokexa.cpp
Thanks,
Aaron
CONFIDENTIALITY NOTICE
This e-mail message and any attachments are only for the use of the intended
recipient and may contain information that is privileged, confidential or
exempt from disclosure under applicable law. If you are not the intended
recipient, any disclosure, distribution or other use of this e-mail message or
attachments is prohibited. If you have received this e-mail message in error,
please delete and notify the sender immediately. Thank you.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com