Hi Josh,
Can you attach one of your OSDmaps with the poison entries? Between
ceph osd getmap 149 -o /tmp/149
ceph osd getmap 155 -o /tmp/155
I should see one of them.
Thanks!
sage
On Sat, 14 Jan 2012, Josh Pieper wrote:
> Sage Weil wrote:
> > Hi Josh,
> >
> > On Sat, 14 Jan 2012, Josh Pieper wrote:
> > > I just upgraded our test cluster to 0.40, and immediately after
> > > starting up get asserts in all the OSDs. I've inlined a relevant
> > > backtrace below, is there anything else that would be useful for
> > > debugging?
> >
> > Are you coming from 0.39 or something older?
>
> I was upgrading from 0.39.
>
> > You might try reverting 4728f4f8e09878c583c65cd882e031d37f8d903e and see
> > if that does it..
> >
> > Can you reproduce it with --debug-osd 10 and --debug-ms 10?
>
> Unfortunately, I cannot appear to reproduce the problem any more.
> Re-upgrading to 0.40 now shows no problem, I've tried to explore the
> range of things I may have done, but with no luck. I had to trash my
> journals in order to downgrade, so there is some amount of state that
> was lost which may be related to my inability to reproduce now?
>
> For what it is worth, I believe the problem may have been caused by
> something the 0.40 versions were sending. As I was downgrading back
> to 0.39, the downgraded 0.39 version kept dying with the same error as
> long as one of the 0.40 versions was still up.
>
> I did not know of the ms debugging when I was first investigating, but
> looking through my old data, I have a trace with OSD debug set to 20
> of the 0.39 version dying of the fault:
>
> http://joshp.no-ip.com:8080/20120114-osd-family-error.log.bz2
>
> -Josh
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html