Hello,

yesterday we upgraded a mimic cluster to v14.2.10, everything was running and 
ok.

There was this new warning, 2 pool(s) have non-power-of-two pg_num and to get a 
HEALTH_OK state until we can expand this pools,
i found this config option to suppress the warning:

ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false which 
resulted in a crash of 40 osd processes (about 60% of the cluster).

no restart possible, always the same crash.

2020-06-30 21:13:56.179 7fd2b7708c00 -1 osd.30 385679 log_to_monitors 
{default=true}
*** Caught signal (Segmentation fault) **
 in thread 7fd2a5813700 thread_name:fn_odsk_fstore
 ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus 
(stable)
 1: (()+0x11390) [0x7fd2b53a3390]
 2: /usr/bin/ceph-osd() [0x87fd12]
 3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) 
[0x8f0f91]
 4: (C_OnMapCommit::finish(int)+0x17) [0x946897]
 5: (Context::complete(int)+0x9) [0x8fbfb9]
 6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e]
 7: (()+0x76ba) [0x7fd2b53996ba]
 8: (clone()+0x6d) [0x7fd2b49a041d]
2020-06-30 21:13:56.199 7fd2a5813700 -1 *** Caught signal (Segmentation fault) 
**
 in thread 7fd2a5813700 thread_name:fn_odsk_fstore

 ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus 
(stable)
 1: (()+0x11390) [0x7fd2b53a3390]
 2: /usr/bin/ceph-osd() [0x87fd12]
 3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) 
[0x8f0f91]
 4: (C_OnMapCommit::finish(int)+0x17) [0x946897]
 5: (Context::complete(int)+0x9) [0x8fbfb9]
 6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e]
 7: (()+0x76ba) [0x7fd2b53996ba]
 8: (clone()+0x6d) [0x7fd2b49a041d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.

 -1547> 2020-06-30 21:13:51.171 7fd2b7708c00 -1 missing 'type' file, inferring 
filestore from current/ dir
  -738> 2020-06-30 21:13:56.179 7fd2b7708c00 -1 osd.30 385679 log_to_monitors 
{default=true}
     0> 2020-06-30 21:13:56.199 7fd2a5813700 -1 *** Caught signal (Segmentation 
fault) **
 in thread 7fd2a5813700 thread_name:fn_odsk_fstore

 ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus 
(stable)
 1: (()+0x11390) [0x7fd2b53a3390]
 2: /usr/bin/ceph-osd() [0x87fd12]
 3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) 
[0x8f0f91]
 4: (C_OnMapCommit::finish(int)+0x17) [0x946897]
 5: (Context::complete(int)+0x9) [0x8fbfb9]
 6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e]
 7: (()+0x76ba) [0x7fd2b53996ba]
 8: (clone()+0x6d) [0x7fd2b49a041d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.

 -1547> 2020-06-30 21:13:51.171 7fd2b7708c00 -1 missing 'type' file, inferring 
filestore from current/ dir
  -738> 2020-06-30 21:13:56.179 7fd2b7708c00 -1 osd.30 385679 log_to_monitors 
{default=true}
     0> 2020-06-30 21:13:56.199 7fd2a5813700 -1 *** Caught signal (Segmentation 
fault) **
 in thread 7fd2a5813700 thread_name:fn_odsk_fstore

 ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus 
(stable)
 1: (()+0x11390) [0x7fd2b53a3390]
 2: /usr/bin/ceph-osd() [0x87fd12]
 3: (OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*)+0x5e1) 
[0x8f0f91]
 4: (C_OnMapCommit::finish(int)+0x17) [0x946897]
 5: (Context::complete(int)+0x9) [0x8fbfb9]
 6: (Finisher::finisher_thread_entry()+0x15e) [0xeb2b8e]
 7: (()+0x76ba) [0x7fd2b53996ba]
 8: (clone()+0x6d) [0x7fd2b49a041d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.

This is a mixed cluster of ubuntu xenial and bionic, it happens on both.

It look's like, it happens when the new monmap arrived at the osd.

The only fix i was able to come up with, downgrade ceph-osd to v14.2.9.

Should i open a bug report?

Regards

Markus
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to