Package: tech-ctte

Given our discussion at the last CTTE meeting, I am turning my request
for advice into a formal one.

Most of the /usr-move that is happening via DEP17 seems to be working
out, but the effects of Conflicts raise the question of what kinds of
interactions with a package manager are considered supported.

A naive reading of Debian policy 7.4 suggests that declaring a conflict
reliably prevents concurrent unpack:

| When one binary package declares a conflict with another using a
| Conflicts field, dpkg will refuse to allow them to be unpacked on
| the system at the same time.

If you account for the effects of aliasing, this turns out to be a too
naive reading as dpkg actually allows unpacking a conflicting package if
the other package is scheduled for removal. Normally, this exception
should not have observable consequences, but aliasing makes it
observable in the form of file loss. I have filed #1057199 to clarify
debian-policy.

I subsequently triggered the discussion of what kinds of upgrades we
consider supported on debian-de...@lists.debian.org and invited the CTTE
for feedback (20231221094157.ga2753...@subdivi.de). While there is a
preliminary conclusion, subsequent discussion in the last CTTE meeting
and elsewhere has shown some disagreement.

The relevant situation is not entirely trivial to construct:

 * Package $first contains an aliased file $file and this is moved to
   package $second in an update.
   OR
   Package $first diverts aliased location $file normally owned by
   package $second.

 * An update to package $second moves $file to its physical location
   below /usr.

 * Package $second declares a versioned conflict for package $first with
   any version that contains or diverts the aliased $file.

Then we can construct a file loss scenario:

 * Install package $first.
 * Schedule $first for removal:
   echo "$first remove" | dpkg --set-selections
 * Install the updated $second:
   dpkg --unpack $second.deb

In that last step, dpkg will unpack $second despite the conflicted
package $first still being unpacked. After performing the unpack, it
will remove package $first to honour the declared conflict. When doing
so, it will remove the aliased $file and in doing so remove the updated
$file at the physical location below /usr owned by package $second.

The following packages technically contain a conflict involving
aliasing:
 * bfh-container
 * busybox
 * busybox-static
 * busybox-syslogd
 * daemontools-run
 * dhcpcd-base
 * elogind
 * exfat-utils
 * exfatprogs
 * finit-sysv
 * inetutils-syslogd
 * libelogind0
 * libnfsidmap1
 * libsystemd0
 * molly-guard
 * openresolv
 * opensysusers
 * progress-linux-container
 * resolvconf
 * runit
 * runit-init
 * systemctl
 * systemd
 * systemd-resolved
 * systemd-standalone-sysusers
 * systemd-standalone-tmpfiles
 * systemd-sysv
 * sysvinit-core
 * udev
 * zabbix-proxy-mysql
 * zabbix-proxy-pgsql
 * zabbix-proxy-sqlite3
 * zabbix-server-mysql
 * zabbix-server-pgsql
The following files are affected (normalized to aliased representation):
 * /bin/busybox
 * /bin/loginctl
 * /bin/systemctl
 * /bin/systemd-sysusers
 * /bin/systemd-tmpfiles
 * /lib/dhcpcd/dhcpcd-hooks/01-test
 * /lib/dhcpcd/dhcpcd-hooks/20-resolv.conf
 * /lib/dhcpcd/dhcpcd-hooks/30-hostname
 * /lib/dhcpcd/dhcpcd-hooks/60-ntp-common.conf
 * /lib/dhcpcd/dhcpcd-hooks/62-chrony.conf
 * /lib/dhcpcd/dhcpcd-hooks/64-timesyncd.conf
 * /lib/dhcpcd/dhcpcd-hooks/68-openntpd.conf
 * /lib/dhcpcd/dhcpcd-run-hooks
 * /lib/systemd/system/sysinit.target.wants/systemd-hwdb-update.service
 * /lib/systemd/system/systemd-hwdb-update.service
 * /lib/systemd/system/zabbix-proxy.service
 * /lib/systemd/system/zabbix-server.service
 * /lib/udev/rules.d/70-uaccess.rules
 * /lib/udev/rules.d/71-seat.rules
 * /lib/udev/rules.d/73-seat-late.rules
 * /lib/x86_64-linux-gnu/libnfsidmap/nsswitch.so
 * /lib/x86_64-linux-gnu/libnfsidmap/regex.so
 * /lib/x86_64-linux-gnu/libnfsidmap/static.so
 * /lib/x86_64-linux-gnu/libnfsidmap/umich_ldap.so
 * /lib/x86_64-linux-gnu/libsystemd.so.0
 * /sbin/coldreboot
 * /sbin/exfatlabel
 * /sbin/fsck.exfat
 * /sbin/halt
 * /sbin/init
 * /sbin/mkfs.exfat
 * /sbin/pm-hibernate
 * /sbin/pm-suspend
 * /sbin/pm-suspend-hybrid
 * /sbin/poweroff
 * /sbin/reboot
 * /sbin/resolvconf
 * /sbin/runlevel
 * /sbin/shutdown
 * /sbin/syslogd
 * /sbin/telinit
 * /sbin/update-service

While systemctl, libsystemd.so.0 or systemd-sysusers may sound
important, these only are relevant when your upgrade migrates from
docker-systemctl-replacement, elogind or opensysusers during your
dist-upgrade.

The list is incomplete as there are patches adding more. For instance,
cryptsetup, gzip isc-dhcp-client and netplan-generator will likely gain
affected Conflicts. I expect that time64 might double the number.

I note that for the loss to happen, removal needs to be scheduled and
this is not something apt seems to do a lot. apt can be forced to
perform temporary removal using dpkg --set-selections by adding mutual
conflicts (mutual conflicts and breaks). At the time of this writing we
do not have any such mutual conflicts in the archive.

In most upgrade scenarios, apt will remove/upgrade package $first before
performing the unpack of $second. In these cases, no loss happens.

Therefore, I hope that the loss cannot be experienced when upgrading
with apt or frontends using apt such as aptitude, but there is no proof
of this.

For some of the conflicts we already added mitigations:
 * systemd-sysv.postinst restores what has been lost
 * gzip.postinst (#1059533) will restore what has been lost

Technically speaking, the change to gzip will introduce a policy
violation as essential packages are expected to work even when unpacked
and gzip will loose its e.g. /usr/bin/zcmp when zutils is upgraded or
removed and only regain it in gzip.postinst. (Again, no loss happens if
apt handles zutils before gzip or when no zutils is installed and that's
what I expect to happen.)

One takeaway from the CTTE meeting was that this loss should be
mitigated when it may make a system unbootable. That is a property that
is difficult to capture and would likely require mitigating half of the
conflicts.

The way of mitigation also is non-trivial. In the window between unpack
of files that will be lost and actual loss, no maintainer script is run
reliably. Hence, copies of affected files have to be installed
elsewhere:
 * systemd-sysv looses only symlinks whose target is specified in
   postinst.
 * gzip looses shell scripts that will be embedded in its postinst.
 * For larger files such as dhclient, we consider moving the file to an
   unaffected location and then pointing a symlink at it such that only
   the symlink needs to be restored.

As you can see, the proposed mitigation is not automatic and introduces
non-trivial code that can contain bugs of its own. The cost of applying
it is non-trivial while the chances of reproducing this kind of problem
in a real-world scenario seem small to me.

After an upgrade, loss can be diagnosed using `dpkg --verify` and any
loss can be rectified by reinstalling affected packages.

As you can see, the question at hand is not very boolean. For each
affected conflict we may consider:
 * How likely is it to encounter the loss (with apt / with dpkg)?
 * Has it been observed in a real upgrade? (We have no non-artificial
   reports yet.)
 * How much effort is it to develop the mitigation? (typically a few
   hours as testing is non-trivial still)
 * How likely is it to introduce new problems when doing so?
 * How many affected users will read release-notes before rebooting and
   mitigate the problem locally?
 * When will this mitigation be cleaned up?

For the gzip case, we have the additional question whether we tolerate
the temporary policy violation for the trixie upgrade or halt the
/usr-move and retry with a modified dpkg (that could land in trixie, so
we could complete in forky).

Quite obviously, I'm very biased here, because writing those patches
likely falls on me and I'd rather avoid that.

I also note that there may be unknowns beneath. In May last year, things
looked good, then we found empty directory loss and m-a:same shared file
loss. Then things looked good again until October when we found this.
Chances are, there are more unknowns.

If we want to conclude the transition in trixie, we have little options
but to move forward now. Doing so will cause more instances of this
problem category. Adding the mitigations later does not cause
significant extra costs compared to adding them immediately. As I do not
expect the CTTE reverting the /usr-merge for this issue, I'll be moving
forward with introducing more affected conflicts.

Helmut

Reply via email to