Control: fixed -1 2.4.4-1

Nishanth Aravamudan <nish.aravamu...@canonical.com> writes:

> I believe this is because the prerm of corosync.service [...]
> unconditionally stops corosync for all Debian and Ubuntu releases
> (as the init script is installed even if unused by systemd). When
> corosync stops, pacemaker fails to connect to corosync (and the
> pacemaker systemd unit file specifies that pacemaker Requires corosync)
> and also stops.
>
> When the postinst for corosync runs [...] corosync will start, but
> there is no connection between corosync starting (systemd or SysV) and
> pacemaker.

Right.

> I think there are two necessary changes to the packaging/upstream to fix
> this:
>
> 1) The systemd unit file should indicate pacemaker is PartOf corosync,
> which will propogate restarts of corosync to pacemaker. This will also
> propogate stops, but as mentioned above, pacemaker already stops when
> corosync stops, so I think it's harmless.

How would this help?  Currently pacemaker.service Requires
corosync.service, which is a stronger (stricter) constraint than PartOf
would be if I read systemd.unit(5) correctly.

> Additionally, the SysV init file should be updated to check if the
> pacemaker SysV status was running before stopping corosync in the
> restart path and start pacemaker as well after starting corosync.

I don't intend to go there.  If you stop Corosync under Pacemaker,
Pacemaker will fail and the node will be fenced.  Systemd helps with
this by cleanly stopping Pacemaker (and any other service declaring a
Requires relation to Corosync) beforehand; SysV init has no comparable
mechanisms.  And you can't expect the Corosync init script take care of
all possible dependent services (Pacemaker, DLM, cLVM, corosync-notifyd,
whatever).  This is part of the reason why I don't really support SysV
init in the HA stack.

> 2) d/rules should call dh_installinit with --restart-after-upgrade. This
> is the default in compat >= 10 (2.4.2-3 is still at 9). That will change
> the prerm and postinst to not stop/start the service on upgrade, but
> simply restart it in the postinst (removals will still stop the
> service).

Corosync 2.4.4-1 has switched to compat 11, so this is done.

> Now, neither of these actually fix the existing packages unfortunately,
> which will stop pacemaker on the upgrade to a fixed package and thus
> stop pacemaker. I have no idea if there actually is any way to fix this
> for existing packages, since the 'old' prerm will be invoked by dpkg on
> the upgrade path.

I don't find this a too serious problem.  Inconvenient, yes, but if
you're running Corosync, then you probably have a highly available setup
where even a prolonged node outage does not lead to service interruption.
Your monitoring system delivers a warning, you start Pacemaker or reboot
and everything is back to normal.

Anders Kaseorg <ande...@mit.edu> writes:

> This just bit me on a Stretch cluster when upgrading corosync from 2.4.2-3 
> to 2.4.2-3+deb9u1.  Marking as such.

I really should have put a warning about this into the DSA.

> Please apply the suggested fixes as soon as possible.

See above; I'm really not sure about fixing this in stable.  Changing
the restart behavior would be possible, but doing an update just for
this would be silly, because the old prerm would stop Corosync for one
last time anyway.
-- 
Regards,
Feri

Reply via email to