On Fri, 14 Jul 2017, Joao Eduardo Luis wrote:
> Dear all,
> 
> 
> The current upgrade procedure to jewel, as stated by the RC's release notes,

You mean (jewel or kraken) -> luminous, I assume...

> can be boiled down to
> 
> - upgrade all monitors first
> - upgrade osds only after we have a **full** quorum, comprised of all the
> monitors in the monmap, of luminous monitors (i.e., once we have the
> 'luminous' feature enabled in the monmap).
> 
> While this is a reasonable idea in principle, reducing a lot of the possible
> upgrade testing combinations, and a simple enough procedure from Ceph's
> point-of-view, it seems it's not a widespread upgrade procedure.
> 
> As far as I can tell, it's not uncommon for users to take this maintenance
> window to perform system-wide upgrades, including kernel and glibc for
> instance, and finishing the upgrade with a reboot.
> 
> The problem with our current upgrade procedure is that once the first server
> reboots, the osds in that server will be unable to boot, as the monitor quorum
> is not yet 'luminous'.
> 
> The only way to minimize potential downtime is to upgrade and restart all the
> nodes at the same time, which can be daunting and it basically defeats the
> purpose of a rolling upgrade. And in this scenario, there is an expectation of
> downtime, something Ceph is built to prevent.
> 
> Additionally, requiring the `luminous` feature to be enabled in the quorum
> becomes even less realistic in the face of possible failures. God forbid that
> in the middle of upgrading, the last remaining monitor server dies a horrible
> death - e.g., power, network. We'll be left with still a 'not-luminous'
> quorum, and a bunch of OSDs waiting for this flag to be flipped. And not it's
> a race to either get that monitor up, or remove it from the monmap.
> 
> Even if one were to make the decision of only upgrading system packages,
> reboot, and then upgrade Ceph packages, there is the unfortunate possibility
> that library interdependencies would require Ceph's binaries to be updated, so
> this may be a show-stopper as well.
> 
> Alternatively, if one is to simply upgrade the system and not reboot, and then
> proceed to perform the upgrade procedure, one would still be in a fragile
> position: if, for some reason, one of the nodes reboots, we're in the same
> precarious situation as before.
> 
> Personally, I can see two ways out of this, at different positions in the
> reasonability spectrum:
> 
> 1. add temporary monitor nodes to the cluster, may they be on VMs or bare
> hardware, already running Luminous, and then remove the same amount of
> monitors from the cluster. This leaves us to upgrade a single monitor node.
> This has the drawback of folks not having spare nodes to run the monitors on,
> or running monitors on VMs -- which may affect their performance during the
> upgrade window, and increase complexity in terms of firewall and routing
> rules.
> 
> 2. migrate/upgrade all nodes on which Monitors are located first, then only
> restart them after we've gotten all nodes upgraded. If anything goes wrong,
> one can hurry through this step or fall-back to 3.
> 
> 3. Reducing the monitor quorum to 1. This pains me to even think about, and it
> bothers me to bits that I'm finding myself even considering this as a
> reasonable possibility. It shouldn't, because it isn't. But it's a lot more
> realistic than expecting OSD downtime during an upgrade procedure.
> 
> On top of this all, I found during my tests that any OSD, running luminous
> prior to the luminous quorum, will need to be restarted before it can properly
> boot into the cluster. I'm guessing this is a bug rather than a feature
> though.

That sounds like a bug.. probably didn't subscribe to map updates from 
_start_boot() or something.  Can you open an immediate ticket?

> Any thoughts on how to mitigate this, or on whether I got this all wrong and
> am missing a crucial detail that blows this wall of text away, please let me
> know.

I don't know; the requirement that mons be upgraded before OSDs doesn't 
seem that unreasonable to me.  That might be slightly more painful in a 
hyperconverged scenario (osds and mons on the same host), but it should 
just require some admin TLC (restart mon daemons instead of 
rebooting).

Also, for large clusters, users often have mons on dedicated hosts.  And 
for small clusters even the slopppy "just reboot" approach will have a 
smaller impact.

Is there something in some distros that *requires* a reboot in order to 
upgrade packages?

Also, this only seems like it will affect users that are getting their 
ceph packages from the distro itself and not from a ceph.com channel or a 
special subscription/product channel (this is how the RHEL stuff works, I 
think).

sage

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to