Le mercredi 17 avril 2013 à 20:52 +0200, Olivier Bonvalet a écrit :
> What I didn't understand is why the OSD process crash, instead of
> marking that PG "corrupted", and does that PG really "corrupted" are
> is
> this just an OSD bug ?
Once again, a bit more informations : by searching informations about
one of this faulty PG (3.d), I found that :
-592> 2013-04-20 08:31:56.838280 7f0f41d1b700 0 log [ERR] : 3.d osd.25
inconsistent snapcolls on a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3
found expected 12d7
-591> 2013-04-20 08:31:56.838284 7f0f41d1b700 0 log [ERR] : 3.d osd.4
inconsistent snapcolls on a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3
found expected 12d7
-590> 2013-04-20 08:31:56.838290 7f0f41d1b700 0 log [ERR] : 3.d osd.4: soid
a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3 size 4194304 != known size 0
-589> 2013-04-20 08:31:56.838292 7f0f41d1b700 0 log [ERR] : 3.d osd.11
inconsistent snapcolls on a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3
found expected 12d7
-588> 2013-04-20 08:31:56.838294 7f0f41d1b700 0 log [ERR] : 3.d osd.11: soid
a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3 size 4194304 != known size 0
-587> 2013-04-20 08:31:56.838395 7f0f41d1b700 0 log [ERR] : scrub 3.d
a8620b0d/rb.0.15c26.238e1f29.000000004603/12d7//3 on disk size (0) does not
match object info size (4194304)
I prefered to verify, so I found that :
# md5sum
/var/lib/ceph/osd/ceph-*/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.000000004603__12d7_A8620B0D__3
217ac2518dfe9e1502e5bfedb8be29b8
/var/lib/ceph/osd/ceph-4/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.000000004603__12d7_A8620B0D__3
(4MB)
217ac2518dfe9e1502e5bfedb8be29b8
/var/lib/ceph/osd/ceph-11/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.000000004603__12d7_A8620B0D__3
(4MB)
d41d8cd98f00b204e9800998ecf8427e
/var/lib/ceph/osd/ceph-25/current/3.d_head/DIR_D/DIR_0/DIR_B/DIR_0/rb.0.15c26.238e1f29.000000004603__12d7_A8620B0D__3
(0B)
So this object is identical on OSD 4 and 11, but is empty on OSD 25.
Since 4 is the master, this should not be a problem, so I try a repair,
without any success :
ceph pg repair 3.d
Is there a way to force rewrite of this replica ?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html