I would also advice people to mind the SELinux if it is enabled on the
OSD's nodes.
The re-labeling should be done as the part of the upgrade and this is
rather time consuming process.

-----Original Message-----
From: Mart van Santen <m...@greenhost.nl>
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Lessons learned upgrading Hammer -> Jewel
Date: Fri, 15 Jul 2016 10:48:40 +0200


  
    
  
  
    

    Hi Wido,

    

    Thank you, we are currently in the same process so this information
    is very usefull. Can you share why you upgraded from hammer
directly
    to jewel, is there a reason to skip infernalis? So, I wonder why
you
    didn't do a hammer->infernalis->jewel upgrade, as that seems
    the logical path for me.

    

    (we did indeed saw the same errors "Failed to encode map eXXX with
    expected crc" when upgrading to the latest hammer)

    

    

    Regards,

    

    Mart

    

    

    

    

    

    

    

    On 07/15/2016 03:08 AM, 席智勇 wrote:

    

>       good job, thank you for sharing, Wido~
>         it's very useful~
> 
>       
> 
>       
> 
>         2016-07-14 14:33 GMT+08:00 Wido den
> >           Hollander <w...@42on.com>:
> 
> > To add,
> >             the RGWs upgraded just fine as well.
> > 
> >             
> > 
> >             No regions in use here (yet!), so that upgraded as it
> >             should.
> > 
> >             
> > 
> >             Wido
> > 
> >             
> > 
> >             > Op 13 juli 2016 om 16:56 schreef Wido den Hollander
> > > >             <w...@42on.com>:
> > 
> >             
> >               >
> > 
> >                 >
> > 
> >                 > Hello,
> > 
> >                 >
> > 
> > > >                 > The last 3 days I worked at a customer with a
1800
> > > >                 OSD cluster which had to be upgraded from Hammer
0.94.5
> >                 to Jewel 10.2.2
> > 
> >                 >
> > 
> > > >                 > The cluster in this case is 99% RGW, but also
some
> >                 RBD.
> > 
> >                 >
> > 
> > > >                 > I wanted to share some of the things we
encountered
> >                 during this upgrade.
> > 
> >                 >
> > 
> > > >                 > All 180 nodes are running CentOS 7.1 on a IPv6-
only
> >                 network.
> > 
> >                 >
> > 
> >                 > ** Hammer Upgrade **
> > 
> >                 > At first we upgraded from 0.94.5 to 0.94.7, this
> >                 went well except for the fact that the monitors got
> >                 spammed with these kind of messages:
> > 
> >                 >
> > 
> >                 >   "Failed to encode map eXXX with expected crc"
> > 
> >                 >
> > 
> >                 > Some searching on the list brought me to:
> > 
> >                 >
> > 
> >                 >   ceph tell osd.* injectargs --
> >                 --clog_to_monitors=false
> > 
> >                 >
> > 
> >                 >  This reduced the load on the 5 monitors and made
> >                 recovery succeed smoothly.
> > 
> >                 >
> > 
> >                 >  ** Monitors to Jewel **
> > 
> >                 >  The next step was to upgrade the monitors from
> >                 Hammer to Jewel.
> > 
> >                 >
> > 
> > > >                 >  Using Salt we upgraded the packages and
afterwards
> >                 it was simple:
> > 
> >                 >
> > 
> >                 >    killall ceph-mon
> > 
> >                 >    chown -R ceph:ceph /var/lib/ceph
> > 
> >                 >    chown -R ceph:ceph /var/log/ceph
> > 
> >                 >
> > 
> >                 > Now, a systemd quirck. 'systemctl start
> > > >                 ceph.target' does not work, I had to manually
enabled
> >                 the monitor and start it:
> > 
> >                 >
> > 
> >                 >   systemctl enable ceph-mon@srv-zmb04-05.service
> > 
> >                 >   systemctl start ceph-mon@srv-zmb04-05.service
> > 
> >                 >
> > 
> >                 > Afterwards the monitors were running just fine.
> > 
> >                 >
> > 
> >                 > ** OSDs to Jewel **
> > 
> > > >                 > To upgrade the OSDs to Jewel we initially used
Salt
> > > >                 to update the packages on all systems to 10.2.2, we
then
> > > >                 used a Shell script which we ran on one node at a
time.
> > 
> >                 >
> > 
> >                 > The failure domain here is 'rack', so we executed
> >                 this in one rack, then the next one, etc, etc.
> > 
> >                 >
> > 
> > > >                 > Script can be found on Github: https://gist.githu
b.com/wido/06eac901bd42f01ca2f4f1a1d76c49a6
> > 
> >                 >
> > 
> > > >                 > Be aware that the chown can take a long, long,
very
> >                 long time!
> > 
> >                 >
> > 
> > > >                 > We ran into the issue that some OSDs crashed
after
> >                 start. But after trying again they would start.
> > 
> >                 >
> > 
> >                 >   "void FileStore::init_temp_collections()"
> > 
> >                 >
> > 
> > > >                 > I reported this in the tracker as I'm not sure
what
> > > >                 is happening here: http://tracker.ceph.com/issues/1
6672
> > 
> >                 >
> > 
> >                 > ** New OSDs with Jewel **
> > 
> >                 > We also had some new nodes which we wanted to add
> >                 to the Jewel cluster.
> > 
> >                 >
> > 
> >                 > Using Salt and ceph-disk we ran into a partprobe
> > > >                 issue in combination with ceph-disk. There was
already a
> > > >                 Pull Request for the fix, but that was not included
in
> >                 Jewel 10.2.2.
> > 
> >                 >
> > 
> > > >                 > We manually applied the PR and it fixed our
issues:
> >                 https://github.com/ceph/ceph/pull/9330
> > 
> >                 >
> > 
> > > >                 > Hope this helps other people with their upgrades
to
> >                 Jewel!
> > 
> >                 >
> > 
> >                 > Wido
> > 
> >                 > _______________________________________________
> > 
> >                 > ceph-users mailing list
> > 
> >                 > ceph-users@lists.ceph.com
> > 
> > > >                 > http://lists.ceph.com/listinfo.cgi/ceph-users-cep
h.com
> > 
> >                 _______________________________________________
> > 
> >                 ceph-users mailing list
> > 
> >                 ceph-users@lists.ceph.com
> > 
> > > >                 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.
com
> > 
> >               
> > 
> >             
> > 
> >           
>         
> 
>         
> 
>       
> 
>       
> 
>       
>       
> 
>       _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
    

    
  

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to