So, this seems to work: ceph-objectstore-tool --op list-pgs --data-path /var/lib/ceph/osd/ceph-36/ --journal-path /var/lib/ceph/osd/ceph-36/journal > /tmp/pgs
Examine /tmp/pgs, compare to 'ceph osd pool ls detail', produce a list of invalid pgs. Then run ceph-objectstore-tool --op remove --data-path /var/lib/ceph/osd/ceph-36/ --journal-path /var/lib/ceph/osd/ceph-36/journal --pgid $id This OSD is now up and running; I'll start in on the rest of them. Thanks for the help. Scott On Tue, Apr 21, 2015 at 1:04 AM Samuel Just <[email protected]> wrote: > Yep, you have hit bug 11429. At some point, you removed a pool and then > restarted these osds. Due to the original bug, 10617, those osds never > actually removed the pgs in that pool. I'm working on a fix, or you can > manually remove pgs corresponding to pools which no longer exist from the > crashing osds using the ceph-objectstore-tool. > -Sam > > ----- Original Message ----- > From: "Scott Laird" <[email protected]> > To: "Samuel Just" <[email protected]> > Cc: "Robert LeBlanc" <[email protected]>, "'[email protected]' > ([email protected])" <[email protected]> > Sent: Monday, April 20, 2015 6:13:06 AM > Subject: Re: [ceph-users] OSDs failing on upgrade from Giant to Hammer > > They're kind of big; here are links: > > https://dl.dropboxusercontent.com/u/104949139/osdmap > https://dl.dropboxusercontent.com/u/104949139/ceph-osd.36.log > > On Sun, Apr 19, 2015 at 8:42 PM Samuel Just <[email protected]> wrote: > > > I have a suspicion about what caused this. Can you restart one of the > > problem osds with > > > > debug osd = 20 > > debug filestore = 20 > > debug ms = 1 > > > > and attach the resulting log from startup to crash along with the osdmap > > binary (ceph osd getmap -o <mapfile>). > > -Sam > > > > ----- Original Message ----- > > From: "Scott Laird" <[email protected]> > > To: "Robert LeBlanc" <[email protected]> > > Cc: "'[email protected]' ([email protected])" < > > [email protected]> > > Sent: Sunday, April 19, 2015 6:13:55 PM > > Subject: Re: [ceph-users] OSDs failing on upgrade from Giant to Hammer > > > > Nope. Straight from 0.87 to 0.94.1. FWIW, at someone's suggestion, I just > > upgraded the kernel on one of the boxes from 3.14 to 3.18; no > improvement. > > Rebooting didn't help, either. Still failing with the same error in the > > logs. > > > > On Sun, Apr 19, 2015 at 2:06 PM Robert LeBlanc < [email protected] > > > wrote: > > > > > > > > Did you upgrade from 0.92? If you did, did you flush the logs before > > upgrading? > > > > On Sun, Apr 19, 2015 at 1:02 PM, Scott Laird < [email protected] > > wrote: > > > > > > > > I'm upgrading from Giant to Hammer (0.94.1), and I'm seeing a ton of OSDs > > die (and stay dead) with this error in the logs: > > > > 2015-04-19 11:53:36.796847 7f61fa900900 -1 osd/OSD.h: In function > > 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f61fa900900 time > > 2015-04-19 11:53:36.794951 > > osd/OSD.h: 716: FAILED assert(ret) > > > > ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > const*)+0x8b) [0xbc271b] > > 2: (OSDService::get_map(unsigned int)+0x3f) [0x70923f] > > 3: (OSD::load_pgs()+0x1769) [0x6c35d9] > > 4: (OSD::init()+0x71f) [0x6c4c7f] > > 5: (main()+0x2860) [0x651fc0] > > 6: (__libc_start_main()+0xf5) [0x7f61f7a3fec5] > > 7: /usr/bin/ceph-osd() [0x66aff7] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > > to interpret this. > > > > This is on a small cluster, with ~40 OSDs on 5 servers running Ubuntu > > 14.04. So far, every single server that I've upgraded has had at least > one > > disk that has failed to restart with this error, and one has had several > > disks in this state. > > > > Restarting the OSD after it dies with this doesn't help. > > > > I haven't lost any data through this due to my slow rollout, but it's > > really annoying. > > > > Here are two full logs from OSDs on two different machines: > > > > https://dl.dropboxusercontent.com/u/104949139/ceph-osd.25.log > > https://dl.dropboxusercontent.com/u/104949139/ceph-osd.34.log > > > > Any suggestions? > > > > > > Scott > > > > > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
