Re: [ceph-users] how to debug pg inconsistent state - no ioerrors seen

Kenneth Waegeman Tue, 09 Aug 2016 02:00:53 -0700

Hi,

I did a diff on the directories of all three the osds, no difference ..So I don't know what's wrong.

Only thing I see different is a scrub file in the TEMP folder (it isalready another pg than last mail):

-rw-r--r-- 1 ceph ceph 0 Aug 9 09:51scrub\u6.107__head_00000107__fffffffffffffff8


But it is empty..

Thanks!


On 09/08/16 04:33, Goncalo Borges wrote:

Hi Kenneth...
The previous default behavior of 'ceph pg repair' was to copy the pgobjects from the primary osd to others. Not sure if it is till thecase in Jewel. For this reason, once we get these kind of errors in adata pool, the best practice is to compare the md5 checksums of thedamaged object in all osds involved in the inconsistent pg. Since wehave a 3 replica cluster, we should find a 2 good object quorum. If bychance the primary osd has the wrong object, it should delete itbefore running the repair.
On a metadata pool, I am not sure exactly how to cross check since allobjects are size 0 and therefore, md5sum is meaningless. Maybe, oneway forward could be to check the contents of the pg directories (ex:/var/lib/ceph/osd/ceph-0/current/5.161_head/) in all osds involved forthe pg and see if we spot something wrong?
Cheers

G.


On 08/08/2016 09:40 PM, Kenneth Waegeman wrote:
Hi all,
Since last week, some pg's are going in the inconsistent state aftera scrub error. Last week we had 4 pgs in that state, They were ondifferent OSDS, but all of the metadata pool.I did a pg repair on them, and all were healthy again. But now againone pg is inconsistent.
with health detail I see:

pg 6.2f4 is active+clean+inconsistent, acting [3,5,1]
1 scrub errors

And in the log of the primary:
2016-08-06 06:24:44.723224 7fc5493f3700 -1 log_channel(cluster) log[ERR] : 6.2f4 shard 5: soid 6:2f55791f:::606.00000000:headomap_digest 0x3a105358 != best guess omap_digest 0xc85c4361 from authshard 12016-08-06 06:24:53.931029 7fc54bbf8700 -1 log_channel(cluster) log[ERR] : 6.2f4 deep-scrub 0 missing, 1 inconsistent objects2016-08-06 06:24:53.931055 7fc54bbf8700 -1 log_channel(cluster) log[ERR] : 6.2f4 deep-scrub 1 errors
I looked in dmesg but I couldn't see any IO errors on any of the OSDsin the acting set. Last week it was another set. It is of coursepossible more than 1 OSD is failing, but how can we check this, sincethere is nothing more in the logs?
Thanks !!

K
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how to debug pg inconsistent state - no ioerrors seen

Reply via email to