Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

Tuomas Juntunen Mon, 04 May 2015 10:17:47 -0700

Hi below is the mds dump

dumped mdsmap epoch 1799
epoch   1799
flags   0
created 2014-12-10 12:44:34.188118
modified        2015-05-04 07:16:37.205350
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
last_failure    1794
last_failure_osd_epoch  21750
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
max_mds 1
in      0
up      {0=5827504}
failed
stopped
data_pools      0
metadata_pool   1
inline_data     disabled
5827504:        10.20.0.11:6800/3382530 'ceph1' mds.0.262 up:rejoin seq
33159


The active+clean+replay has been there for a day now, so there must be
something that is not ok, if it should've gone away in cople of minutes.


Thanks

Tuomas

-----Original Message-----
From: Sage Weil [mailto:[email protected]] 
Sent: 4. toukokuuta 2015 18:29
To: Tuomas Juntunen
Cc: [email protected]; [email protected]
Subject: RE: [ceph-users] Upgrade from Giant to Hammer and after some basic
operations most of the OSD's went down

On Mon, 4 May 2015, Tuomas Juntunen wrote:
> Hi
> 
> Thanks Sage, I got it working now. Everything else seems to be ok, 
> except mds is reporting "mds cluster is degraded", not sure what could be
wrong.
> Mds is running and all osds are up and pg's are active+clean and
> active+clean+replay.

Great!  The 'replay' part should clear after a minute or two.

> Had to delete some empty pools which were created while the osd's were 
> not working and recovery started to go through.
> 
> Seems mds is not that stable, this isn't the first time it goes degraded.
> Before it just started to work, but now I just can't get it back working.

What does 'ceph mds dump' say?

sage

> 
> Thanks
> 
> Br,
> Tuomas
> 
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]
> Sent: 1. toukokuuta 2015 21:14
> To: Sage Weil
> Cc: tuomas.juntunen; [email protected]; 
> [email protected]
> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and after some 
> basic operations most of the OSD's went down
> 
> Thanks, I'll do this when the commit is available and report back.
> 
> And indeed, I'll change to the official ones after everything is ok.
> 
> Br,
> Tuomas
> 
> > On Fri, 1 May 2015, [email protected] wrote:
> >> Hi
> >>
> >> I deleted the images and img pools and started osd's, they still die.
> >>
> >> Here's a log of one of the osd's after this, if you need it.
> >>
> >> http://beta.xaasbox.com/ceph/ceph-osd.19.log
> >
> > I've pushed another commit that should avoid this case, sha1 
> > 425bd4e1dba00cc2243b0c27232d1f9740b04e34.
> >
> > Note that once the pools are fully deleted (shouldn't take too long 
> > once the osds are up and stabilize) you should switch back to the 
> > normal packages that don't have these workarounds.
> >
> > sage
> >
> >
> >
> >>
> >> Br,
> >> Tuomas
> >>
> >>
> >> > Thanks man. I'll try it tomorrow. Have a good one.
> >> >
> >> > Br,T
> >> >
> >> > -------- Original message --------
> >> > From: Sage Weil <[email protected]>
> >> > Date: 30/04/2015  18:23  (GMT+02:00)
> >> > To: Tuomas Juntunen <[email protected]>
> >> > Cc: [email protected], [email protected]
> >> > Subject: RE: [ceph-users] Upgrade from Giant to Hammer and after 
> >> > some basic
> >>
> >> > operations most of the OSD's went down
> >> >
> >> > On Thu, 30 Apr 2015, [email protected] wrote:
> >> >> Hey
> >> >>
> >> >> Yes I can drop the images data, you think this will fix it?
> >> >
> >> > It's a slightly different assert that (I believe) should not 
> >> > trigger once the pool is deleted.Â  Please give that a try and if 
> >> > you still hit it I'll whip up a workaround.
> >> >
> >> > Thanks!
> >> > sage
> >> >
> >> >  >
> >> >>
> >> >> Br,
> >> >>
> >> >> Tuomas
> >> >>
> >> >> > On Wed, 29 Apr 2015, Tuomas Juntunen wrote:
> >> >> >> Hi
> >> >> >>
> >> >> >> I updated that version and it seems that something did 
> >> >> >> happen, the osd's stayed up for a while and 'ceph status' got
updated.
> >> >> >> But then in couple
> >> of
> >> >> >> minutes, they all went down the same way.
> >> >> >>
> >> >> >> I have attached new 'ceph osd dump -f json-pretty' and got a 
> >> >> >> new log
> >> from
> >> >> >> one of the osd's with osd debug = 20, 
> >> >> >> http://beta.xaasbox.com/ceph/ceph-osd.15.log
> >> >> >
> >> >> > Sam mentioned that you had said earlier that this was not 
> >> >> > critical
> data?
> >> >> > If not, I think the simplest thing is to just drop those 
> >> >> > pools.Â The important thing (from my perspective at least :) 
> >> >> > is that we understand
> >> the
> >> >> > root cause and can prevent this in the future.
> >> >> >
> >> >> > sage
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> Thank you!
> >> >> >>
> >> >> >> Br,
> >> >> >> Tuomas
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> -----Original Message-----
> >> >> >> From: Sage Weil [mailto:[email protected]]
> >> >> >> Sent: 28. huhtikuuta 2015 23:57
> >> >> >> To: Tuomas Juntunen
> >> >> >> Cc: [email protected]; [email protected]
> >> >> >> Subject: Re: [ceph-users] Upgrade from Giant to Hammer and 
> >> >> >> after some
> >> basic
> >> >> >> operations most of the OSD's went down
> >> >> >>
> >> >> >> Hi Tuomas,
> >> >> >>
> >> >> >> I've pushed an updated wip-hammer-snaps branch.Â  Can you 
> >> >> >> please
> try it?
> >> >> >> The build will appear here
> >> >> >>
> >> >> >>
> >> >> >> http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/sha1/
> >> >> >> 08
> >> >> >> bf531331afd5e
> >> >> >> 2eb514067f72afda11bcde286
> >> >> >>
> >> >> >> (or a similar url; adjust for your distro).
> >> >> >>
> >> >> >> Thanks!
> >> >> >> sage
> >> >> >>
> >> >> >>
> >> >> >> On Tue, 28 Apr 2015, Sage Weil wrote:
> >> >> >>
> >> >> >> > [adding ceph-devel]
> >> >> >> >
> >> >> >> > Okay, I see the problem.Â  This seems to be unrelated ot 
> >> >> >> > the giant -> hammer move... it's a result of the tiering 
> >> >> >> > changes you
> made:
> >> >> >> >
> >> >> >> > > > > > > > The following:
> >> >> >> > > > > > > >
> >> >> >> > > > > > > > ceph osd tier add img images --force-nonempty 
> >> >> >> > > > > > > > ceph osd tier cache-mode images forward ceph 
> >> >> >> > > > > > > > osd tier set-overlay img images
> >> >> >> >
> >> >> >> > Specifically, --force-nonempty bypassed important safety
checks.
> >> >> >> >
> >> >> >> > 1. images had snapshots (and removed_snaps)
> >> >> >> >
> >> >> >> > 2. images was added as a tier *of* img, and img's 
> >> >> >> > removed_snaps was copied to images, clobbering the 
> >> >> >> > removed_snaps value (see
> >> >> >> > OSDMap::Incremental::propagate_snaps_to_tiers)
> >> >> >> >
> >> >> >> > 3. tiering relation was undone, but removed_snaps was still 
> >> >> >> > gone
> >> >> >> >
> >> >> >> > 4. on OSD startup, when we load the PG, removed_snaps is 
> >> >> >> > initialized with the older map.Â  later, in 
> >> >> >> > PGPool::update(), we assume that removed_snaps alwasy grows 
> >> >> >> > (never shrinks) and we
> trigger an assert.
> >> >> >> >
> >> >> >> > To fix this I think we need to do 2 things:
> >> >> >> >
> >> >> >> > 1. make the OSD forgiving out removed_snaps getting 
> >> >> >> > smaller.Â This is probably a good thing anyway: once we 
> >> >> >> > know snaps are removed on all OSDs we can prune the 
> >> >> >> > interval_set in the
> OSDMap.Â  Maybe.
> >> >> >> >
> >> >> >> > 2. Fix the mon to prevent this from happening, *even* when 
> >> >> >> > --force-nonempty is specified.Â  (This is the root cause.)
> >> >> >> >
> >> >> >> > I've opened http://tracker.ceph.com/issues/11493 to track this.
> >> >> >> >
> >> >> >> > sage
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > > > > > > >
> >> >> >> > > > > > > > Idea was to make images as a tier to img, move 
> >> >> >> > > > > > > > data to img then change
> >> >> >> > > > > > > clients to use the new img pool.
> >> >> >> > > > > > > >
> >> >> >> > > > > > > > Br,
> >> >> >> > > > > > > > Tuomas
> >> >> >> > > > > > > >
> >> >> >> > > > > > > > > Can you explain exactly what you mean by:
> >> >> >> > > > > > > > >
> >> >> >> > > > > > > > > "Also I created one pool for tier to be able 
> >> >> >> > > > > > > > > to move data without
> >> >> >> > > > > > > outage."
> >> >> >> > > > > > > > >
> >> >> >> > > > > > > > > -Sam
> >> >> >> > > > > > > > > ----- Original Message -----
> >> >> >> > > > > > > > > From: "tuomas juntunen"
> >> >> >> > > > > > > > > <[email protected]>
> >> >> >> > > > > > > > > To: "Ian Colle" <[email protected]>
> >> >> >> > > > > > > > > Cc: [email protected]
> >> >> >> > > > > > > > > Sent: Monday, April 27, 2015 4:23:44 AM
> >> >> >> > > > > > > > > Subject: Re: [ceph-users] Upgrade from Giant 
> >> >> >> > > > > > > > > to Hammer and after some basic operations 
> >> >> >> > > > > > > > > most of the OSD's went down
> >> >> >> > > > > > > > >
> >> >> >> > > > > > > > > Hi
> >> >> >> > > > > > > > >
> >> >> >> > > > > > > > > Any solution for this yet?
> >> >> >> > > > > > > > >
> >> >> >> > > > > > > > > Br,
> >> >> >> > > > > > > > > Tuomas
> >> >> >> > > > > > > > >
> >> >> >> > > > > > > > >> It looks like you may have hit
> >> >> >> > > > > > > > >> http://tracker.ceph.com/issues/7915
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> Ian R. Colle Global Director of Software 
> >> >> >> > > > > > > > >> Engineering Red Hat (Inktank is now part of 
> >> >> >> > > > > > > > >> Red Hat!) http://www.linkedin.com/in/ircolle
> >> >> >> > > > > > > > >> http://www.twitter.com/ircolle
> >> >> >> > > > > > > > >> Cell: +1.303.601.7713
> >> >> >> > > > > > > > >> Email: [email protected]
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> ----- Original Message -----
> >> >> >> > > > > > > > >> From: "tuomas juntunen"
> >> >> >> > > > > > > > >> <[email protected]>
> >> >> >> > > > > > > > >> To: [email protected]
> >> >> >> > > > > > > > >> Sent: Monday, April 27, 2015 1:56:29 PM
> >> >> >> > > > > > > > >> Subject: [ceph-users] Upgrade from Giant to 
> >> >> >> > > > > > > > >> Hammer and after some basic operations most 
> >> >> >> > > > > > > > >> of the OSD's went down
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> I upgraded Ceph from 0.87 Giant to 0.94.1 
> >> >> >> > > > > > > > >> Hammer
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> Then created new pools and deleted some old 
> >> >> >> > > > > > > > >> ones. Also I created one pool for tier to be 
> >> >> >> > > > > > > > >> able to move data without
> >> >> >> > > outage.
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> After these operations all but 10 OSD's are 
> >> >> >> > > > > > > > >> down and creating this kind of messages to 
> >> >> >> > > > > > > > >> logs, I get more than 100gb of these in a
> >> >> >> > > > > > night:
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >>Â  -19> 2015-04-27 10:17:08.808584 
> >> >> >> > > > > > > > >>7fd8e748d700Â  5
> >> osd.23
> >> >> >> > > pg_epoch:
> >> >> >> > > >
> >> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7]
> >> >> >> > > > > > > > >>local-les=16609
> >> >> >> > > > > > > > >> n=0
> >> >> >> > > > > > > > >> ec=1 les/c
> >> >> >> > > > > > > > >> 16609/16659
> >> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> >> >> > > > > > > > >> pi=15659-16589/42
> >> >> >> > > > > > > > >> crt=8480'7 lcod
> >> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started Â Â Â  
> >> >> >> > > > > > > > >>-18>
> >> >> >> > > > > > > > >>2015-04-27 10:17:08.808596 7fd8e748d700Â  5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7]
> >> >> >> > > > > > > > >>local-les=16609
> >> >> >> > > > > > > > >> n=0
> >> >> >> > > > > > > > >> ec=1 les/c
> >> >> >> > > > > > > > >> 16609/16659
> >> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> >> >> > > > > > > > >> pi=15659-16589/42
> >> >> >> > > > > > > > >> crt=8480'7 lcod
> >> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Start Â Â Â  -17>
> >> >> >> > > > > > > > >>2015-04-27 10:17:08.808608 7fd8e748d700Â  1
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7]
> >> >> >> > > > > > > > >> local-les=16609
> >> >> >> > > > > > > > >> n=0
> >> >> >> > > > > > > > >> ec=1 les/c
> >> >> >> > > > > > > > >> 16609/16659
> >> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> >> >> > > > > > > > >> pi=15659-16589/42
> >> >> >> > > > > > > > >> crt=8480'7 lcod
> >> >> >> > > > > > > > >> 0'0 inactive NOTIFY] state<Start>: 
> >> >> >> > > > > > > > >> transitioning to
> >> Stray
> >> >> >> > > > > > > > >>Â Â Â  -16> 2015-04-27 10:17:08.808621 
> >> >> >> > > > > > > > >>7fd8e748d700Â  5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7]
> >> >> >> > > > > > > > >>local-les=16609
> >> >> >> > > > > > > > >> n=0
> >> >> >> > > > > > > > >> ec=1 les/c
> >> >> >> > > > > > > > >> 16609/16659
> >> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> >> >> > > > > > > > >> pi=15659-16589/42
> >> >> >> > > > > > > > >> crt=8480'7 lcod
> >> >> >> > > > > > > > >> 0'0 inactive NOTIFY] exit Start 0.000025 0
> >> >> >> > > > > > > > >>0.000000 Â Â Â  -15> 2015-04-27 
> >> >> >> > > > > > > > >>10:17:08.808637 7fd8e748d700Â  5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[0.189( v 8480'7 (0'0,8480'7]
> >> >> >> > > > > > > > >>local-les=16609
> >> >> >> > > > > > > > >> n=0
> >> >> >> > > > > > > > >> ec=1 les/c
> >> >> >> > > > > > > > >> 16609/16659
> >> >> >> > > > > > > > >> 16590/16590/16590) [24,3,23] r=2 lpr=17838
> >> >> >> > > > > > > > >> pi=15659-16589/42
> >> >> >> > > > > > > > >> crt=8480'7 lcod
> >> >> >> > > > > > > > >> 0'0 inactive NOTIFY] enter Started/Stray Â Â 
> >> >> >> > > > > > > > >>Â
> >> >> >> > > > > > > > >>-14> 2015-04-27 10:17:08.808796 7fd8e748d700Â
> >> >> >> > > > > > > > >>5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0
> >> >> >> > > > > > > > >>ec=17863  les/c
> >> >> >> > > > > > > > >> 17879/17879
> >> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879
> >> >> >> > > > > > > > >>crt=0'0  inactive NOTIFY] exit Reset 0.119467 
> >> >> >> > > > > > > > >>4
> >> >> >> > > > > > > > >>0.000037 Â Â Â  -13> 2015-04-27 
> >> >> >> > > > > > > > >>10:17:08.808817 7fd8e748d700Â  5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0
> >> >> >> > > > > > > > >>ec=17863  les/c
> >> >> >> > > > > > > > >> 17879/17879
> >> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879
> >> >> >> > > > > > > > >>crt=0'0  inactive NOTIFY] enter Started Â Â Â
> >> >> >> > > > > > > > >>-12> 2015-04-27 10:17:08.808828 7fd8e748d700Â
> >> >> >> > > > > > > > >>5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0
> >> >> >> > > > > > > > >>ec=17863  les/c
> >> >> >> > > > > > > > >> 17879/17879
> >> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879
> >> >> >> > > > > > > > >>crt=0'0  inactive NOTIFY] enter Start Â Â Â
> >> >> >> > > > > > > > >>-11> 2015-04-27 10:17:08.808838 7fd8e748d700Â
> >> >> >> > > > > > > > >>1
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0
> >> >> >> > > > > > > > >>ec=17863  les/c
> >> >> >> > > > > > > > >> 17879/17879
> >> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879
> >> >> >> > > > > > > > >>crt=0'0  inactive NOTIFY]
> >> >> >> > > > > > > > >> state<Start>: transitioning to Stray Â Â Â
> >> >> >> > > > > > > > >>-10> 2015-04-27 10:17:08.808849 7fd8e748d700Â
> >> >> >> > > > > > > > >>5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0
> >> >> >> > > > > > > > >>ec=17863  les/c
> >> >> >> > > > > > > > >> 17879/17879
> >> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879
> >> >> >> > > > > > > > >>crt=0'0  inactive NOTIFY] exit Start 0.000020 
> >> >> >> > > > > > > > >>0
> >> >> >> > > > > > > > >>0.000000 Â Â Â Â  -9> 2015-04-27
> >> >> >> > > > > > > > >>10:17:08.808861 7fd8e748d700Â  5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[10.181( empty local-les=17879 n=0
> >> >> >> > > > > > > > >>ec=17863  les/c
> >> >> >> > > > > > > > >> 17879/17879
> >> >> >> > > > > > > > >> 17863/17863/17863) [25,5,23] r=2 lpr=17879
> >> >> >> > > > > > > > >>crt=0'0  inactive NOTIFY] enter Started/Stray 
> >> >> >> > > > > > > > >>Â Â Â Â  -8> 2015-04-27 10:17:08.809427 
> >> >> >> > > > > > > > >>7fd8e748d700Â  5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 
> >> >> >> > > > > > > > >>ec=1 les/c
> >> >> >> > > > > > > > >> 16127/16344
> >> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838
> >> >> >> > > > > > > > >>crt=0'0 mlcod
> >> >> >> > > > > > > > >> 0'0 inactive] exit Reset 7.511623 45 
> >> >> >> > > > > > > > >>0.000165 Â Â Â Â  -7> 2015-04-27 
> >> >> >> > > > > > > > >>10:17:08.809445 7fd8e748d700Â  5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 
> >> >> >> > > > > > > > >>ec=1 les/c
> >> >> >> > > > > > > > >> 16127/16344
> >> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838
> >> >> >> > > > > > > > >>crt=0'0 mlcod
> >> >> >> > > > > > > > >> 0'0 inactive] enter Started Â Â Â Â  -6>
> >> >> >> > > > > > > > >>2015-04-27 10:17:08.809456 7fd8e748d700Â  5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 
> >> >> >> > > > > > > > >>ec=1 les/c
> >> >> >> > > > > > > > >> 16127/16344
> >> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838
> >> >> >> > > > > > > > >>crt=0'0 mlcod
> >> >> >> > > > > > > > >> 0'0 inactive] enter Start Â Â Â Â  -5>
> >> >> >> > > > > > > > >>2015-04-27 10:17:08.809468 7fd8e748d700Â  1
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 
> >> >> >> > > > > > > > >>ec=1 les/c
> >> >> >> > > > > > > > >> 16127/16344
> >> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838
> >> >> >> > > > > > > > >>crt=0'0 mlcod
> >> >> >> > > > > > > > >> 0'0 inactive]
> >> >> >> > > > > > > > >> state<Start>: transitioning to Primary Â Â Â 
> >> >> >> > > > > > > > >>Â
> >> >> >> > > > > > > > >>-4> 2015-04-27 10:17:08.809479 7fd8e748d700Â  
> >> >> >> > > > > > > > >>-4> 5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 
> >> >> >> > > > > > > > >>ec=1 les/c
> >> >> >> > > > > > > > >> 16127/16344
> >> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838
> >> >> >> > > > > > > > >>crt=0'0 mlcod
> >> >> >> > > > > > > > >> 0'0 inactive] exit Start 0.000023 0 0.000000 
> >> >> >> > > > > > > > >>Â Â Â Â  -3> 2015-04-27 10:17:08.809492 
> >> >> >> > > > > > > > >>7fd8e748d700Â  5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 
> >> >> >> > > > > > > > >>ec=1 les/c
> >> >> >> > > > > > > > >> 16127/16344
> >> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838
> >> >> >> > > > > > > > >>crt=0'0 mlcod
> >> >> >> > > > > > > > >> 0'0 inactive] enter Started/Primary Â Â Â Â
> >> >> >> > > > > > > > >>-2> 2015-04-27 10:17:08.809502 7fd8e748d700Â  
> >> >> >> > > > > > > > >>-2> 5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 
> >> >> >> > > > > > > > >>ec=1 les/c
> >> >> >> > > > > > > > >> 16127/16344
> >> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838
> >> >> >> > > > > > > > >>crt=0'0 mlcod
> >> >> >> > > > > > > > >> 0'0 inactive] enter Started/Primary/Peering 
> >> >> >> > > > > > > > >>Â Â Â Â  -1> 2015-04-27 10:17:08.809513 
> >> >> >> > > > > > > > >>7fd8e748d700Â  5
> >> >> >> > > > > > > > >> osd.23
> >> >> >> > > > pg_epoch:
> >> >> >> > > > >
> >> >> >> > > > > > > > >> 17882 pg[2.189( empty local-les=16127 n=0 
> >> >> >> > > > > > > > >>ec=1 les/c
> >> >> >> > > > > > > > >> 16127/16344
> >> >> >> > > > > > > > >> 16125/16125/16125) [23,5] r=0 lpr=17838
> >> >> >> > > > > > > > >>crt=0'0 mlcod
> >> >> >> > > > > > > > >> 0'0 peering] enter 
> >> >> >> > > > > > > > >>Started/Primary/Peering/GetInfo Â Â Â Â Â  0>
> >> >> >> > > > > > > > >>2015-04-27 10:17:08.813837 7fd8e748d700 -1
> >> >> >> > > > > > > ./include/interval_set.h:
> >> >> >> > > > > > > > >> In
> >> >> >> > > > > > > > >> function 'void interval_set<T>::erase(T, T) 
> >> >> >> > > > > > > > >> [with T =
> >> >> >> > > snapid_t]'
> >> >> >> > > > > > > > >> thread
> >> >> >> > > > > > > > >> 7fd8e748d700 time 2015-04-27 10:17:08.809899
> >> >> >> > > > > > > > >> ./include/interval_set.h: 385: FAILED 
> >> >> >> > > > > > > > >> assert(_size >=
> >> >> >> > > > > > > > >> 0)
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >>Â  ceph version 0.94.1
> >> >> >> > > > > > > > >> (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
> >> >> >> > > > > > > > >>Â  1: (ceph::__ceph_assert_fail(char const*, 
> >> >> >> > > > > > > > >>char
> >> const*,
> >> >> >> > > > > > > > >> int, char
> >> >> >> > > > > > > > >> const*)+0x8b)  [0xbc271b] Â  2:
> >> >> >> > > > > > > > >> 
> >> >> >> > > > > > > > >>(interval_set<snapid_t>::subtract(interval_se
> >> >> >> > > > > > > > >>t<
> >> >> >> > > > > > > > >>snapid_t
> >> >> >> > > > > > > > >> >
> >> >> >> > > > > > > > >> const&)+0xb0) [0x82cd50] Â  3: 
> >> >> >> > > > > > > > >>(PGPool::update(std::tr1::shared_ptr<OSDMap
> >> >> >> > > > > > > > >> const>)+0x52e) [0x80113e]
> >> >> >> > > > > > > > >>Â  4:
> >> (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap
> >> >> >> > > > > > > > >> const>, std::tr1::shared_ptr<OSDMap const>, 
> >> >> >> > > > > > > > >> const>std::vector<int,
> >> >> >> > > > > > > > >> std::allocator<int> >&, int, 
> >> >> >> > > > > > > > >> std::vector<int, std::allocator<int>
> >> >> >> > > > > > > > >> >&, int, PG::RecoveryCtx*)+0x282) [0x801652]
> >> >> >> > > > > > > > >>Â  5: (OSD::advance_pg(unsigned int, PG*, 
> >> >> >> > > > > > > > >>ThreadPool::TPHandle&, PG::RecoveryCtx*, 
> >> >> >> > > > > > > > >>std::set<boost::intrusive_ptr<PG>,
> >> >> >> > > > > > > > >> std::less<boost::intrusive_ptr<PG> >, 
> >> >> >> > > > > > > > >>std::allocator<boost::intrusive_ptr<PG> >
> >> >> >> > > > > > > > >>>*)+0x2c3)  [0x6b0e43] Â  6: 
> >> >> >> > > > > > > > >>(OSD::process_peering_events(std::list<PG*,
> >> >> >> > > > > > > > >> std::allocator<PG*>
> >> >> >> > > > > > > > >> > const&,
> >> >> >> > > > > > > > >> ThreadPool::TPHandle&)+0x21c) [0x6b191c] Â  7: 
> >> >> >> > > > > > > > >>(OSD::PeeringWQ::_process(std::list<PG*,
> >> >> >> > > > > > > > >> std::allocator<PG*>
> >> >> >> > > > > > > > >> > const&,
> >> >> >> > > > > > > > >> ThreadPool::TPHandle&)+0x18) [0x709278] Â  8:
> >> (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e)
> >> >> >> > > > > > > > >> [0xbb38ae]
> >> >> >> > > > > > > > >>Â  9: (ThreadPool::WorkThread::entry()+0x10)
> >> >> >> > > > > > > > >>[0xbb4950] Â  10: (()+0x8182) 
> >> >> >> > > > > > > > >>[0x7fd906946182] Â  11: (clone()+0x6d) 
> >> >> >> > > > > > > > >>[0x7fd904eb147d]
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> Also by monitoring (ceph -w) I get the 
> >> >> >> > > > > > > > >> following messages, also lots of
> >> >> >> > > > > > > them.
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> 2015-04-27 10:39:52.935812 mon.0 [INF]
> from='client.?
> >> >> >> > > > > > > 10.20.0.13:0/1174409'
> >> >> >> > > > > > > > >> entity='osd.30' cmd=[{"prefix": "osd crush 
> >> >> >> > > > > > > > >> create-or-move",
> >> >> >> > > > "args":
> >> >> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 30,
> "weight":
> >> >> 1.82}]:
> >> >> >>
> >> >> >> > > > > > > > >> dispatch
> >> >> >> > > > > > > > >> 2015-04-27 10:39:53.297376 mon.0 [INF]
> from='client.?
> >> >> >> > > > > > > 10.20.0.13:0/1174483'
> >> >> >> > > > > > > > >> entity='osd.26' cmd=[{"prefix": "osd crush 
> >> >> >> > > > > > > > >> create-or-move",
> >> >> >> > > > "args":
> >> >> >> > > > > > > > >> ["host=ceph3", "root=default"], "id": 26,
> "weight":
> >> >> 1.82}]:
> >> >> >>
> >> >> >> > > > > > > > >> dispatch
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> This is a cluster of 3 nodes with 36 OSD's, 
> >> >> >> > > > > > > > >> nodes are also mons and mds's to save servers.
> >> >> >> > > > > > > > >> All run Ubuntu
> >> >> >> 14.04.2.
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> I have pretty much tried everything I could 
> >> >> >> > > > > > > > >> think
> of.
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> Restarting daemons doesn't help.
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> Any help would be appreciated. I can also 
> >> >> >> > > > > > > > >> provide more logs if necessary. They just 
> >> >> >> > > > > > > > >> seem to get pretty large in few
> >> >> >> > > moments.
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> Thank you
> >> >> >> > > > > > > > >> Tuomas
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >> ____________________________________________
> >> >> >> > > > > > > > >> __ _ ceph-users mailing list 
> >> >> >> > > > > > > > >> [email protected]
> >> >> >> > > > > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-user
> >> >> >> > > > > > > > >> s-
> >> >> >> > > > > > > > >> ceph.com
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >>
> >> >> >> > > > > > > > >
> >> >> >> > > > > > > > >
> >> >> >> > > > > > > > > _____________________________________________
> >> >> >> > > > > > > > > __ ceph-users mailing list 
> >> >> >> > > > > > > > > [email protected] 
> >> >> >> > > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users
> >> >> >> > > > > > > > > -c
> >> >> >> > > > > > > > > eph.com
> >> >> >> > > > > > > > >
> >> >> >> > > > > > > > >
> >> >> >> > > > > > > > >
> >> >> >> > > > > > > >
> >> >> >> > > > > > > >
> >> >> >> > > > > > > > _______________________________________________
> >> >> >> > > > > > > > ceph-users mailing list 
> >> >> >> > > > > > > > [email protected] 
> >> >> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-c
> >> >> >> > > > > > > > ep
> >> >> >> > > > > > > > h.com
> >> >> >> > > > > > > >
> >> >> >> > > > > > > >
> >> >> >> > > > > > > >
> >> >> >> > > > > > > > _______________________________________________
> >> >> >> > > > > > > > ceph-users mailing list 
> >> >> >> > > > > > > > [email protected] 
> >> >> >> > > > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-c
> >> >> >> > > > > > > > ep
> >> >> >> > > > > > > > h.com
> >> >> >> > > > > > > >
> >> >> >> > > > > > > >
> >> >> >> > > > > > >
> >> >> >> > > > > >
> >> >> >> > > > > >
> >> >> >> > > > >
> >> >> >> > > > >
> >> >> >> > > >
> >> >> >> > >
> >> >> >> > >
> >> >> >> > _______________________________________________
> >> >> >> > ceph-users mailing list
> >> >> >> > [email protected]
> >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe 
> >> >> ceph-devel" in the body of a message to 
> >> >> [email protected] More majordomo info atÂ 
> >> >> http://vger.kernel.org/majordomo-info.html
> >> >>
> >> >>
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > [email protected]
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >>
> >>
> >>
> 
> 
> 
> 

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upgrade from Giant to Hammer and after some basic operations most of the OSD's went down

Reply via email to