Hi Greg,
Sure, should have gathered that myself...
(gdb) bt
#0 0x00007f071a05020b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00000000009a996d in reraise_fatal (signum=11) at
global/signal_handler.cc:59
#2 handle_fatal_signal (signum=11) at global/signal_handler.cc:109
#3 <signal handler called>
#4 crush_do_rule (map=0x52b0d40, ruleno=<optimized out>, x=211857128,
result=0x7fff2f78dcc0, result_max=8, weight=0x53ae5a0, weight_max=120,
scratch=<optimized out>) at crush/mapper.c:937
#5 0x00000000007a85cb in do_rule (weight=..., maxout=8, out=..., x=211857128,
rule=2, this=0x536a680) at ./crush/CrushWrapper.h:1026
#6 OSDMap::_pg_to_osds (this=this@entry=0x53ec088, pool=..., pg=...,
osds=osds@entry=0x7fff2f78dd80, primary=primary@entry=0x7fff2f78de40,
ppps=ppps@entry=0x7fff2f78dd74)
at osd/OSDMap.cc:1521
#7 0x00000000007a8a64 in OSDMap::pg_to_raw_up (this=this@entry=0x53ec088,
pg=..., up=up@entry=0x7fff2f78de60, primary=primary@entry=0x7fff2f78de40) at
osd/OSDMap.cc:1676
#8 0x00000000007ab8f7 in OSDMap::remove_redundant_temporaries (cct=0x5272000,
osdmap=..., pending_inc=pending_inc@entry=0x53ec298) at osd/OSDMap.cc:1198
#9 0x000000000060fdb9 in OSDMonitor::create_pending (this=0x53ec000) at
mon/OSDMonitor.cc:885
#10 0x00000000006047b9 in PaxosService::_active (this=this@entry=0x53ec000) at
mon/PaxosService.cc:272
#11 0x0000000000604ad7 in PaxosService::election_finished (this=0x53ec000) at
mon/PaxosService.cc:250
#12 0x00000000005c34a6 in Monitor::win_election (this=this@entry=0x52bab00,
epoch=epoch@entry=1, active=..., features=features@entry=1125899906842623,
cmdset=0xd14f80 <mon_commands>,
cmdsize=168, classic_monitors=classic_monitors@entry=0x0) at
mon/Monitor.cc:1848
#13 0x00000000005c388c in Monitor::win_standalone_election
(this=this@entry=0x52bab00) at mon/Monitor.cc:1803
#14 0x00000000005c42eb in Monitor::bootstrap (this=this@entry=0x52bab00) at
mon/Monitor.cc:929
#15 0x00000000005c4645 in Monitor::init (this=0x52bab00) at mon/Monitor.cc:742
#16 0x00000000005769c0 in main (argc=<optimized out>, argv=<optimized out>) at
ceph_mon.cc:750
--
Eino Tuominen
-----Original Message-----
From: Gregory Farnum [mailto:[email protected]]
Sent: 31. elokuuta 2015 12:26
To: Eino Tuominen
Cc: [email protected]; Kefu Chai; [email protected]
Subject: Re: [ceph-users] Monitor segfault
Oh whoops, can you install the ceph-debug packages as well? That will
provide line numbers on the call sites. :)
-Greg
On Mon, Aug 31, 2015 at 10:25 AM, Eino Tuominen <[email protected]> wrote:
> Hi Greg,
>
> (gdb) bt
> #0 0x00007f071a05020b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 0x00000000009a996d in ?? ()
> #2 <signal handler called>
> #3 0x000000000085ada2 in crush_do_rule ()
> #4 0x00000000007a85cb in OSDMap::_pg_to_osds(pg_pool_t const&, pg_t,
> std::vector<int, std::allocator<int> >*, int*, unsigned int*) const ()
> #5 0x00000000007a8a64 in OSDMap::pg_to_raw_up(pg_t, std::vector<int,
> std::allocator<int> >*, int*) const ()
> #6 0x00000000007ab8f7 in OSDMap::remove_redundant_temporaries(CephContext*,
> OSDMap const&, OSDMap::Incremental*) ()
> #7 0x000000000060fdb9 in OSDMonitor::create_pending() ()
> #8 0x00000000006047b9 in PaxosService::_active() ()
> #9 0x0000000000604ad7 in PaxosService::election_finished() ()
> #10 0x00000000005c34a6 in Monitor::win_election(unsigned int, std::set<int,
> std::less<int>, std::allocator<int> >&, unsigned long, MonCommand const*,
> int, std::set<int, std::less<int>, std::allocator<int> > const*) ()
> #11 0x00000000005c388c in Monitor::win_standalone_election() ()
> #12 0x00000000005c42eb in Monitor::bootstrap() ()
> #13 0x00000000005c4645 in Monitor::init() ()
> #14 0x00000000005769c0 in main ()
>
> -----Original Message-----
> From: Gregory Farnum [mailto:[email protected]]
> Sent: 31. elokuuta 2015 11:46
> To: Eino Tuominen
> Cc: [email protected]; Kefu Chai; [email protected]
> Subject: Re: [ceph-users] Monitor segfault
>
> On Mon, Aug 31, 2015 at 9:33 AM, Eino Tuominen <[email protected]> wrote:
>> Hello,
>>
>> I'm getting a segmentation fault error from the monitor of our test cluster.
>> The cluster was in a bad state because I have recently removed three hosts
>> from it. Now I started cleaning it up and first marked the removed osd's as
>> lost (ceph osd lost), and then I tried to remove the osd's from the crush
>> map (ceph osd crush remove). After a few successful commands the cluster
>> ceased to respond. On monitor seemed to stay up (it was responding through
>> the admin socket), so I stopped it and used monmaptool to remove the failed
>> monitor from the monmap. But, now also the second monitor segfaults when I
>> try to start it.
>>
>> The cluster does not have any important data, but I'd like to get the
>> monitors up as a practice. How do I debug this further?
>>
>> Linux cephmon-test-02 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00
>> UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>>
>> The output:
>>
>> -2> 2015-08-31 10:28:52.606894 7f8ab493c8c0 0 log_channel(cluster) log
>> [INF] : pgmap v1845959: 6288 pgs: 55 inactive, 153 active, 473 active+clean,
>> 1 stale+active+undersized+degraded+remapped, 455 stale+incomplete, 272
>> peering, 145 stale+down+peering, 6 degraded+remapped, 1
>> active+recovery_wait+degraded, 70 undersized+degraded+remapped, 504
>> incomplete, 206 active+undersized+degraded+remapped, 2
>> stale+active+clean+inconsistent, 101 down+peering, 59
>> active+undersized+degraded+remapped+backfilling, 294 remapped, 11
>> active+undersized+degraded+remapped+wait_backfill, 1264 active+remapped, 5
>> stale+undersized+degraded, 1 active+undersized+remapped, 1
>> stale+active+undersized+degraded, 23 stale+remapped+incomplete, 297
>> remapped+peering, 1 active+remapped+wait_backfill, 1 degraded, 32
>> undersized+degraded, 454 active+undersized+degraded, 7
>> active+recovery_wait+degraded+remapped, 1134 stale+active+clean, 142
>> remapped+incomplete, 115 stale+peering, 3 active+recovering+degraded+remapp
ed;
>> 10014 GB data, 5508 GB used, 41981 GB / 47489 GB avail; 33343/19990223
>> objects degraded (0.167%); 45721/19990223 objects misplaced (0.229%)
>> -1> 2015-08-31 10:28:52.606969 7f8ab493c8c0 0 log_channel(cluster) log
>> [INF] : mdsmap e1: 0/0/1 up
>> 0> 2015-08-31 10:28:52.617974 7f8ab493c8c0 -1 *** Caught signal
>> (Segmentation fault) **
>> in thread 7f8ab493c8c0
>>
>> ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>> 1: /usr/bin/ceph-mon() [0x9a98aa]
>> 2: (()+0x10340) [0x7f8ab3a3d340]
>> 3: (crush_do_rule()+0x292) [0x85ada2]
>> 4: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int,
>> std::allocator<int> >*, int*, unsigned int*) const+0xeb) [0x7a85cb]
>> 5: (OSDMap::pg_to_raw_up(pg_t, std::vector<int, std::allocator<int> >*,
>> int*) const+0x94) [0x7a8a64]
>> 6: (OSDMap::remove_redundant_temporaries(CephContext*, OSDMap const&,
>> OSDMap::Incremental*)+0x317) [0x7ab8f7]
>> 7: (OSDMonitor::create_pending()+0xf69) [0x60fdb9]
>> 8: (PaxosService::_active()+0x709) [0x6047b9]
>> 9: (PaxosService::election_finished()+0x67) [0x604ad7]
>> 10: (Monitor::win_election(unsigned int, std::set<int, std::less<int>,
>> std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int,
>> std::less<int>, std::allocator<int> > const*)
>> +0x236) [0x5c34a6]
>> 11: (Monitor::win_standalone_election()+0x1cc) [0x5c388c]
>> 12: (Monitor::bootstrap()+0x9bb) [0x5c42eb]
>> 13: (Monitor::init()+0xd5) [0x5c4645]
>> 14: (main()+0x2470) [0x5769c0]
>> 15: (__libc_start_main()+0xf5) [0x7f8ab1ec7ec5]
>> 16: /usr/bin/ceph-mon() [0x5984f7]
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
>> interpret this.
>
> Can you get a core dump, open it in gdb, and provide the output of the
> "backtrace" command?
>
> The cluster is for some reason trying to create new PGs and something
> is going wrong; I suspect the monitors aren't handling the loss of PGs
> properly. :/
> -Greg
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com