Hi,
On a 0.80.7 cluster I'm experiencing a couple of OSDs refusing to start
due to a crash they encounter when reading the PGLog.
A snippet of the log:
-11> 2014-10-27 21:56:04.690046 7f034a006800 10
filestore(/var/lib/ceph/osd/ceph-25) _do_transaction on 0x392e600
-10> 2014-10-27 21:56:04.690078 7f034a006800 20
filestore(/var/lib/ceph/osd/ceph-25) _check_global_replay_guard no xattr
-9> 2014-10-27 21:56:04.690140 7f034a006800 20
filestore(/var/lib/ceph/osd/ceph-25) _check_replay_guard no xattr
-8> 2014-10-27 21:56:04.690150 7f034a006800 15
filestore(/var/lib/ceph/osd/ceph-25) touch meta/a1630ecd/pglog_14.1a56/0//-1
-7> 2014-10-27 21:56:04.690184 7f034a006800 10
filestore(/var/lib/ceph/osd/ceph-25) touch
meta/a1630ecd/pglog_14.1a56/0//-1 = 0
-6> 2014-10-27 21:56:04.690196 7f034a006800 15
filestore(/var/lib/ceph/osd/ceph-25) _omap_rmkeys
meta/a1630ecd/pglog_14.1a56/0//-1
-5> 2014-10-27 21:56:04.690290 7f034a006800 10 filestore oid:
a1630ecd/pglog_14.1a56/0//-1 not skipping op, *spos 1435883.0.2
-4> 2014-10-27 21:56:04.690295 7f034a006800 10 filestore >
header.spos 0.0.0
-3> 2014-10-27 21:56:04.690314 7f034a006800 0
filestore(/var/lib/ceph/osd/ceph-25) error (1) Operation not permitted
not handled on operation 33 (1435883.0.2, or op 2, counting from 0)
-2> 2014-10-27 21:56:04.690325 7f034a006800 0
filestore(/var/lib/ceph/osd/ceph-25) unexpected error code
-1> 2014-10-27 21:56:04.690327 7f034a006800 0
filestore(/var/lib/ceph/osd/ceph-25) transaction dump:
{ "ops": [
{ "op_num": 0,
"op_name": "nop"},
{ "op_num": 1,
"op_name": "touch",
"collection": "meta",
"oid": "a1630ecd\/pglog_14.1a56\/0\/\/-1"},
{ "op_num": 2,
"op_name": "omap_rmkeys",
"collection": "meta",
"oid": "a1630ecd\/pglog_14.1a56\/0\/\/-1"},
{ "op_num": 3,
"op_name": "omap_setkeys",
"collection": "meta",
"oid": "a1630ecd\/pglog_14.1a56\/0\/\/-1",
"attr_lens": { "can_rollback_to": 12}}]}
0> 2014-10-27 21:56:04.691992 7f034a006800 -1 os/FileStore.cc: In
function 'unsigned int
FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t, int,
ThreadPool::TPHandle*)' thread 7f034a006800 time 2014-10-27 21:56:04.690368
os/FileStore.cc: 2559: FAILED assert(0 == "unexpected error")
The backing XFS filesystem seems to be OK, but isn't this a leveldb
issue where the omap information is stored?
Anyone seen this before? I have about 5 OSDs (out of the 336) which are
showing this problem when booting.
--
Wido den Hollander
42on B.V.
Ceph trainer and consultant
Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html