Hello
I think I’m running into a bug that is described at
http://tracker.ceph.com/issues/14213 for Hammer.
However I’m running the latest version of Jewel 10.2.7, although I’m in the
middle of upgrading the cluster (from 10.2.5). At first it was on a couple of
nodes, but now it seems to be more pervasive.
I have seen this issue with osd_map_cache_size set to 20 as well as 500, which
I increased to try and compensate for it.
My two questions, are
1) is this fixed, if so in which version.
2) is there a way to recover the damaged OSD metadata, as I really don’t want
to keep having to rebuild large numbers of disks based on something arbitrary.
SEEK_HOLE is disabled via 'filestore seek data hole' config option
-31> 2017-05-24 10:23:10.152349 7f24035e2800 0
genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: splice is
s
upported
-30> 2017-05-24 10:23:10.182065 7f24035e2800 0
genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: syncfs(2)
s
yscall fully supported (by glibc and kernel)
-29> 2017-05-24 10:23:10.182112 7f24035e2800 0
xfsfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_feature: extsize is
disab
led by conf
-28> 2017-05-24 10:23:10.182839 7f24035e2800 1 leveldb: Recovering log
#23079
-27> 2017-05-24 10:23:10.284173 7f24035e2800 1 leveldb: Delete type=0 #23079
-26> 2017-05-24 10:23:10.284223 7f24035e2800 1 leveldb: Delete type=3 #23078
-25> 2017-05-24 10:23:10.284807 7f24035e2800 0
filestore(/var/lib/ceph/osd/txc1-1908) mount: enabling WRITEAHEAD journal mode:
c
heckpoint is not enabled
-24> 2017-05-24 10:23:10.285581 7f24035e2800 2 journal open
/var/lib/ceph/osd/txc1-1908/journal fsid 8dada68b-0d1c-4f2a-bc96-1d8
61577bc98 fs_op_seq 20363902
-23> 2017-05-24 10:23:10.289523 7f24035e2800 1 journal _open
/var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
size 4096 bytes, directio = 1, aio = 1
-22> 2017-05-24 10:23:10.293733 7f24035e2800 2 journal open advancing
committed_seq 20363681 to fs op_seq 20363902
-21> 2017-05-24 10:23:10.293743 7f24035e2800 2 journal read_entry -- not
readable
-20> 2017-05-24 10:23:10.293744 7f24035e2800 2 journal read_entry -- not
readable
-19> 2017-05-24 10:23:10.293745 7f24035e2800 3 journal journal_replay: end
of journal, done.
-18> 2017-05-24 10:23:10.297605 7f24035e2800 1 journal _open
/var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
size 4096 bytes, directio = 1, aio = 1
-17> 2017-05-24 10:23:10.298470 7f24035e2800 1
filestore(/var/lib/ceph/osd/txc1-1908) upgrade
-16> 2017-05-24 10:23:10.298509 7f24035e2800 2 osd.1908 0 boot
-15> 2017-05-24 10:23:10.300096 7f24035e2800 1 <cls>
cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
-14> 2017-05-24 10:23:10.300384 7f24035e2800 1 <cls>
cls/user/cls_user.cc:375: Loaded user class!
-13> 2017-05-24 10:23:10.300617 7f24035e2800 0 <cls>
cls/hello/cls_hello.cc:305: loading cls_hello
-12> 2017-05-24 10:23:10.303748 7f24035e2800 1 <cls>
cls/refcount/cls_refcount.cc:232: Loaded refcount class!
-11> 2017-05-24 10:23:10.304120 7f24035e2800 1 <cls>
cls/version/cls_version.cc:228: Loaded version class!
-10> 2017-05-24 10:23:10.304439 7f24035e2800 1 <cls>
cls/log/cls_log.cc:317: Loaded log class!
-9> 2017-05-24 10:23:10.307437 7f24035e2800 1 <cls>
cls/rgw/cls_rgw.cc:3359: Loaded rgw class!
-8> 2017-05-24 10:23:10.307768 7f24035e2800 1 <cls>
cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class!
-7> 2017-05-24 10:23:10.307927 7f24035e2800 0 <cls>
cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
-6> 2017-05-24 10:23:10.308086 7f24035e2800 1 <cls>
cls/statelog/cls_statelog.cc:306: Loaded log class!
-5> 2017-05-24 10:23:10.315241 7f24035e2800 0 osd.1908 863035 crush map
has features 2234490552320, adjusting msgr requires for
clients
-4> 2017-05-24 10:23:10.315258 7f24035e2800 0 osd.1908 863035 crush map
has features 2234490552320 was 8705, adjusting msgr req
uires for mons
-3> 2017-05-24 10:23:10.315267 7f24035e2800 0 osd.1908 863035 crush map
has features 2234490552320, adjusting msgr requires for
osds
-2> 2017-05-24 10:23:10.441444 7f24035e2800 0 osd.1908 863035 load_pgs
-1> 2017-05-24 10:23:10.442608 7f24035e2800 -1 osd.1908 863035 load_pgs:
have pgid 11.3f5a at epoch 863078, but missing map. Crashing.
0> 2017-05-24 10:23:10.444151 7f24035e2800 -1 osd/OSD.cc: In function
'void OSD::load_pgs()' thread 7f24035e2800 time 2017-05-24 10:23:10.442617
osd/OSD.cc: 3189: FAILED assert(0 == "Missing map in load_pgs")
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b)
[0x55d1874be6db]
2: (OSD::load_pgs()+0x1f9b) [0x55d186e6a26b]
3: (OSD::init()+0x1f74) [0x55d186e7aec4]
4: (main()+0x29d1) [0x55d186de1d71]
5: (__libc_start_main()+0xf5) [0x7f24004fdf45]
6: (()+0x356a47) [0x55d186e2aa47]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
Regards
Stuart Harland
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com