Re: [ceph-users] monitor always seg fault after first restart

Joao Eduardo Luis Wed, 18 Jun 2014 07:54:55 -0700

On 18/06/14 11:58, Jan Kalcic wrote:

Hi all,


I am able to manually deploy a new ceph cluster by successfully
bootstrapping the first monitor:

# ceph -s
     cluster 926daa03-5e59-4ae1-a0bd-401a227e74c7
      health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean;
no osds
      monmap e1: 1 mons at {linux-c904=172.17.43.101:6789/0}, election
epoch 2, quorum 0 linux-c904
      osdmap e1: 0 osds: 0 up, 0 in
       pgmap v2: 192 pgs, 3 pools, 0 bytes data, 0 objects
             0 kB used, 0 kB / 0 kB avail
                  192 creating


However, after rebooting the system or if try to restart the monitor I
always get the same seg fault:

/etc/init.d/ceph -v start mon.linux-c904
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "user"
=== mon.linux-c904 ===
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "run dir"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "pid file"
--- linux-c904# mkdir -p /var/run/ceph
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "log dir"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "auto start"
--- linux-c904# [ -e /var/run/ceph/mon.linux-c904.pid ] || exit 1 # no
pid, presumably not running
         pid=`cat /var/run/ceph/mon.linux-c904.pid`
         [ -e /proc/$pid ] && grep -q ceph-mon /proc/$pid/cmdline &&
grep -qwe -i.linux-c904 /proc/$pid/cmdline && exit 0 # running
         exit 1  # pid is something else
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "copy
executable to"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "lock file"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "admin socket"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "max open
files"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "restart on
core dump"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "valgrind"
Starting Ceph mon.linux-c904 on linux-c904...
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "pre start
eval"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "pre start
command"
/usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n mon.linux-c904 "post start
command"
--- linux-c904# ulimit -n 32768;  /usr/bin/ceph-mon -i linux-c904
--pid-file /var/run/ceph/mon.linux-c904.pid -c /etc/ceph/ceph.conf
--cluster ceph
*** Caught signal (Segmentation fault) **
  in thread 7f2028f0a780
  ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
  1: /usr/bin/ceph-mon() [0x89419d]
  2: (()+0xf7c0) [0x7f202887e7c0]
  3: (()+0x61c0) [0x7f2026b621c0]
  4: (_ULx86_64_step()+0x9) [0x7f2026b632a9]
  5: (()+0x393a5) [0x7f2028ac53a5]
  6: (GetStackTrace(void**, int, int)+0xe) [0x7f2028ac4d1e]
  7: (tcmalloc::PageHeap::GrowHeap(unsigned long)+0x10f) [0x7f2028ab4b5f]
  8: (tcmalloc::PageHeap::New(unsigned long)+0xbb) [0x7f2028ab52ab]
  9: (tcmalloc::CentralFreeList::Populate()+0x7b) [0x7f2028ab30ab]
  10: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**,
void**)+0x58) [0x7f2028ab32e8]
  11: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x8b)
[0x7f2028ab33ab]
  12: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long,
unsigned long)+0x69) [0x7f2028ab7719]
  13: (()+0x1994b) [0x7f2028aa594b]
  14: (tc_new()+0x18) [0x7f2028ac63c8]
  15: (std::string::_Rep::_S_create(unsigned long, unsigned long,
std::allocator<char> const&)+0x59) [0x7f202762f6a9]
  16: (std::string::_M_mutate(unsigned long, unsigned long, unsigned
long)+0x63) [0x7f202762f8a3]
  17: (std::string::_M_replace_safe(unsigned long, unsigned long, char
const*, unsigned long)+0x2c) [0x7f202762fa3c]
  18: (leveldb::DBImpl::RecoverLogFile(unsigned long,
leveldb::VersionEdit*, unsigned long*)+0x425) [0x7f20278950b5]
  19: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x655)
[0x7f2027895a45]
  20: (leveldb::DB::Open(leveldb::Options const&, std::string const&,
leveldb::DB**)+0xeb) [0x7f2027895dfb]
  21: (LevelDBStore::do_open(std::ostream&, bool)+0x10d) [0x840b7d]
  22: (main()+0x14a5) [0x53d035]
  23: (__libc_start_main()+0xe6) [0x7f2026d88c36]
  24: /usr/bin/ceph-mon() [0x53a209]


By cleaning everything up and starting over, the monitor starts
successfully again:

# rm -rf /var/lib/ceph/mon/ceph-linux-c904/*
# rm /tmp/ceph.mon.keyring
# rm /etc/ceph/ceph.client.admin.keyring

# ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon.
--cap mon 'allow *'
# ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring
--gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd
'allow *' --cap mds 'allow'
# ceph-authtool /tmp/ceph.mon.keyring --import-keyring
/etc/ceph/ceph.client.admin.keyring
# monmaptool --create --add linux-c904 172.17.43.101 --fsid
926daa03-5e59-4ae1-a0bd-401a227e74c7 /tmp/monmap
# ceph-mon --mkfs -i linux-c904 --monmap /tmp/monmap --keyring
/tmp/ceph.mon.keyring

# /etc/init.d/ceph start mon.linux-c904
=== mon.linux-c904 ===
Starting Ceph mon.linux-c904 on linux-c904...
Starting ceph-create-keys on linux-c904...


The following the content of my ceph.conf file:

[global]
fsid = 926daa03-5e59-4ae1-a0bd-401a227e74c7
mon initial members = linux-c904
mon host = 172.17.43.101
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
filestore xattr use omap = true
osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 333
osd pool default pgp num = 333
osd crush chooseleaf type = 1


What's wrong with it?


What's your leveldb and google-perftools/tcmalloc version?

  -Joao


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] monitor always seg fault after first restart

Reply via email to