[ceph-users] Bug in OSD Maps

Stuart Harland Wed, 24 May 2017 02:33:08 -0700

Hello

I think I’m running into a bug that is described at 
http://tracker.ceph.com/issues/14213 for Hammer.


However I’m running the latest version of Jewel 10.2.7, although I’m in the 
middle of upgrading the cluster (from 10.2.5). At first it was on a couple of 
nodes, but now it seems to be more pervasive.

I have seen this issue with osd_map_cache_size set to 20 as well as 500, which 
I increased to try and compensate for it.

My two questions, are 

1) is this fixed, if so in which version.
2) is there a way to recover the damaged OSD metadata, as I really don’t want 
to keep having to rebuild large numbers of disks based on something arbitrary.


SEEK_HOLE is disabled via 'filestore seek data hole' config option
   -31> 2017-05-24 10:23:10.152349 7f24035e2800  0 
genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: splice is 
s
upported
   -30> 2017-05-24 10:23:10.182065 7f24035e2800  0 
genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: syncfs(2) 
s
yscall fully supported (by glibc and kernel)
   -29> 2017-05-24 10:23:10.182112 7f24035e2800  0 
xfsfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_feature: extsize is 
disab
led by conf
   -28> 2017-05-24 10:23:10.182839 7f24035e2800  1 leveldb: Recovering log 
#23079
   -27> 2017-05-24 10:23:10.284173 7f24035e2800  1 leveldb: Delete type=0 #23079

   -26> 2017-05-24 10:23:10.284223 7f24035e2800  1 leveldb: Delete type=3 #23078

   -25> 2017-05-24 10:23:10.284807 7f24035e2800  0 
filestore(/var/lib/ceph/osd/txc1-1908) mount: enabling WRITEAHEAD journal mode: 
c
heckpoint is not enabled
   -24> 2017-05-24 10:23:10.285581 7f24035e2800  2 journal open 
/var/lib/ceph/osd/txc1-1908/journal fsid 8dada68b-0d1c-4f2a-bc96-1d8
61577bc98 fs_op_seq 20363902
   -23> 2017-05-24 10:23:10.289523 7f24035e2800  1 journal _open 
/var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
size 4096 bytes, directio = 1, aio = 1
   -22> 2017-05-24 10:23:10.293733 7f24035e2800  2 journal open advancing 
committed_seq 20363681 to fs op_seq 20363902
   -21> 2017-05-24 10:23:10.293743 7f24035e2800  2 journal read_entry -- not 
readable
   -20> 2017-05-24 10:23:10.293744 7f24035e2800  2 journal read_entry -- not 
readable
   -19> 2017-05-24 10:23:10.293745 7f24035e2800  3 journal journal_replay: end 
of journal, done.
   -18> 2017-05-24 10:23:10.297605 7f24035e2800  1 journal _open 
/var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
size 4096 bytes, directio = 1, aio = 1
   -17> 2017-05-24 10:23:10.298470 7f24035e2800  1 
filestore(/var/lib/ceph/osd/txc1-1908) upgrade
   -16> 2017-05-24 10:23:10.298509 7f24035e2800  2 osd.1908 0 boot
   -15> 2017-05-24 10:23:10.300096 7f24035e2800  1 <cls> 
cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
   -14> 2017-05-24 10:23:10.300384 7f24035e2800  1 <cls> 
cls/user/cls_user.cc:375: Loaded user class!
   -13> 2017-05-24 10:23:10.300617 7f24035e2800  0 <cls> 
cls/hello/cls_hello.cc:305: loading cls_hello
   -12> 2017-05-24 10:23:10.303748 7f24035e2800  1 <cls> 
cls/refcount/cls_refcount.cc:232: Loaded refcount class!
   -11> 2017-05-24 10:23:10.304120 7f24035e2800  1 <cls> 
cls/version/cls_version.cc:228: Loaded version class!
   -10> 2017-05-24 10:23:10.304439 7f24035e2800  1 <cls> 
cls/log/cls_log.cc:317: Loaded log class!
    -9> 2017-05-24 10:23:10.307437 7f24035e2800  1 <cls> 
cls/rgw/cls_rgw.cc:3359: Loaded rgw class!
    -8> 2017-05-24 10:23:10.307768 7f24035e2800  1 <cls> 
cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class!
    -7> 2017-05-24 10:23:10.307927 7f24035e2800  0 <cls> 
cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
    -6> 2017-05-24 10:23:10.308086 7f24035e2800  1 <cls> 
cls/statelog/cls_statelog.cc:306: Loaded log class!
    -5> 2017-05-24 10:23:10.315241 7f24035e2800  0 osd.1908 863035 crush map 
has features 2234490552320, adjusting msgr requires for
 clients
    -4> 2017-05-24 10:23:10.315258 7f24035e2800  0 osd.1908 863035 crush map 
has features 2234490552320 was 8705, adjusting msgr req
uires for mons
    -3> 2017-05-24 10:23:10.315267 7f24035e2800  0 osd.1908 863035 crush map 
has features 2234490552320, adjusting msgr requires for
 osds
    -2> 2017-05-24 10:23:10.441444 7f24035e2800  0 osd.1908 863035 load_pgs
    -1> 2017-05-24 10:23:10.442608 7f24035e2800 -1 osd.1908 863035 load_pgs: 
have pgid 11.3f5a at epoch 863078, but missing map.  Crashing.
     0> 2017-05-24 10:23:10.444151 7f24035e2800 -1 osd/OSD.cc: In function 
'void OSD::load_pgs()' thread 7f24035e2800 time 2017-05-24 10:23:10.442617
osd/OSD.cc: 3189: FAILED assert(0 == "Missing map in load_pgs")

 ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) 
[0x55d1874be6db]
 2: (OSD::load_pgs()+0x1f9b) [0x55d186e6a26b]
 3: (OSD::init()+0x1f74) [0x55d186e7aec4]
 4: (main()+0x29d1) [0x55d186de1d71]
 5: (__libc_start_main()+0xf5) [0x7f24004fdf45]
 6: (()+0x356a47) [0x55d186e2aa47]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.

Regards

Stuart Harland

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Bug in OSD Maps

Reply via email to