Simon McVittie writes ("Bug#1050001: Unwinding directory aliasing"):
> What do you consider to be the end goal of this proposal?

Desired end state
=================

This is a very good question.  I had a very constructive conversation
with Helmut via video chat.  It seems that there's a misunderstanding
about the desired end state.

My idea of a desired end state is as follows:

/bin and /lib etc. remain directories (so there is no aliasing).  All
actual files are shipped in /usr.  / contains compatibility symlinks
pointing into /usr, for those files/APIs/programs where this is needed
(which is far from all of them).  Eventualloy, over time, the set of
compatibility links is reduced to a mere handful.

I think this is a more desirable situation than the current planned
end state, which is that /bin and /lib are symlinks.

Aliasing is EBW, and "Only use canonical names" is not good enough
==================================================================

There is basically one underlying technical reason for preferring the
un-aliased usrmerge approach: aliasing directories in this way leads
to great complication in file management, especially in package
management software and in individual packages.

The DEP-17 problem list is a survey of the aliasing-induced problems
which have been discovered so far.  But we (still!) keep discovering
new ones.

The current plan, as I understand it, is that we will fix these
problems by arranging to *always* name files by their canonical paths,
ie the ones in /usr.

Naming files by their canonical names will have to be done everywhere.
This is because any time a file is named by a non-canonical path, a
program that tries to manipulate that file might malfunction.
(Whether it malfunctions in practice depends on the precise details
and gets very complicated.)

Spotting and mitigating violations is hard
------------------------------------------

We do not currently have good tooling that will spot violations of
this rule.  It's not clear precisley what the right behaviour of our
tools would be; we need to alert *the right set of users* to the
mistakes, and *with the right level of severity*.  Many of our key
tools don't have a good way to produce "critical warnings".  The
consequences of violations are unpredicatable and can depend on event
ordering.  But they can be very severe.  So we are creating a source
of bad heisenbugs.

Also, we only have direct control over the behaviour of our own
packages, images, etc. that we (Debian) ship.  Any time anyone in the
field (perhaps an invididual sysadmin or user; perhaps a 3rd party
software supplier; perhaps a downstream distro) violates this rule
(whether through ignorance, or choice), affected systems will
malfunction.  (I think this means that relying on lintian, for
example, as a defence against these mistakes, is not good enough.)

The answer implied by the current plan seems to be that these people
are just doing the wrong thing and will have learn not to?  But the
very existence of the directory symlinks implies a recognition that
confusion over whether to name files in / or in /usr is expected to
continue for a long time.  If it weren't, then there would be no need
for these symlinks.

Violations of the "only use canonical names" rule are required
--------------------------------------------------------------

Worse, violations of the "use only canonical names" rule are not only
expected, they are *necessary*:

There are quite a few places where we will have to keep naming files
by their names in /, becaue those things appear in highly stsble
public APIs/ABIs.  For example, we must ship binaries that refer to
the dynamic loader in /lib; shell scripts must start #!/bin/sh.

Now, those references are almost all in "immutable" contexts, where it
doesn't actually matter, since the file is in fact available by the
non-canonical name.  However, this introduces a new implied rule:
it becomes a bug to take a filename you see in a place where the file
is being *read*, and apply it in a context where the file is going to
be *updated*.

This reuse of a filename is a very natural approach.  It is something
that is frequently done by humans, but it is also sometimes doen by
automatic software of many kinds.  It's not something we've even had
to consider before as a thing.  But now it is (sometimes) wrong.
Usually it will work, but sometimes it will make a (perhaps latent or
unpredictable) bug.

Looking towards the future
--------------------------

It seems to me that directory aliasing will continue to be a source of
very annoying bugs indefinitely, well after the transition is fully
complete.  In another 20 years we'll still be debugging strange
installation breakage that will turn out to be due to directory
aliasing.

I don't doubt that the bug rate will kept "tolerably low" by QA
efforts.  However, we all know what a "tolerably low" bug rate looks
like - systems that are in practice just not quite unreliable enough
to be worth fixing.  And we have much better things to spend our time
and effort (and tolerance for bugs) on.

As I understand it the focus of the current technical work is to try
to figure out how we can get to only-canonical-paths from here, while
working aorund all of the (potential) bugs which arise during the
transition period, when necessarily we will be naming files sometimes
by their names in / and sometimes by their names in /usr.

This technical work seems really quite difficult.  It's certainly
clear that without funding from Freexian we wouldn't be in a position
to undertake it.  Nevertheless I think it is entirely possible that
this technical work will succeed on its own tersm, in the sense that
the upgrades for systems running Debian itself will go reasonably
smoothly with only a tolerable failure rate.

But as I say I don't think the end state being worked towards here, is
far from the best end state.

If I'm ever entitled to play the "I wrote dpkg" card, I think it's
now.  As the author of dpkg, which I intended to be highly reliable
software (and, I like to think, I succeeded), I think this ia very
poor system design.

And, the approach being taken very seriously privileges Debian itself,
and those well-staffed derivatives able to do the necessary transition
auditing (albeit, indeed, with tooling from Debian).  I am
firmly ideologically opposed to such a tradeoff.

The non-aliased approach
========================

Simon comments, on the non-aliases approach.

> This does some but not all of what merged-/usr does: calling /usr/bin/sh
> would become a non-bug, but calling /bin/env would still be an error,
> /bin would still represent non-trivial on-disk and/or in-dpkg-database
> state,

I think that in the long term, /bin *will* become trivial enough.

One of the advantages of the non-aliases approach is that we can
continually improve it to get closer to the desired ideal.

> and we would still potentially have other issues triggered by
> the directories being distinct from one another (like the one discussed
> by the tech committee in #911225, which was exactly a regression caused
> by having moved a library in the traditional Debian way).

>From my conversation with Helmut, it seems that we are envisaging, as
part of the aliased-usrmerge approach, that there will be tools to
detect violations of the "refer only by canonical path" rule.

But detecting violations of "these directories only ought to contain
compat symlinks into /usr" rule is a *lot* simpler.  It can be done,
quite reliably, on end-user systems.

If we had done usrmerge the non-aliased way, then such a checking
program would be able to detect a /-vs-/usr bug analogous to #911225.
So I think a non-directory-aliases variant of this bug is more
tractable than a directory-aliases variant.

> If I remember correctly, openSUSE tried to get from unmerged /usr to
> merged /usr by essentially the route you propose, successfully reached
> the symlink-farm state, but then got stuck without a way to get from the
> symlink farm to the single symbolic link. Do you have a plan for how that
> would be achieved without breaking upgrades or going behind dpkg's back?

As I say above, I don't think we should ever go to the state with a
single symbolic link.  The end state ought to be /lib and /bin with
about six symlinks in.

I hope this helps clarify my thinking.

Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

Reply via email to