Zack Weinberg wrote:
I haven't been following this long discussion very closely but I do have some opinions
(with my "de facto autoconf maintainer" hat on):
1. As a general rule, it is not safe to change the canonicalization (i.e. the config.sub
output) of an existing system name, *at all*; in many cases, not even if it is wrong. I
find that people working on GNU tools often don't realize just how broadly used these
names are. Changing the canonicalization of "CPU-VENDOR-mingw32", for example,
is very likely to break things like Ansible playbooks and Travis-style CI build matrices
-- one-off files that exist by the tens of thousands and there's no practical way to
*enumerate* them all, let alone get them all changed to satisfy a GNU-internal desire for
a more consistent naming convention.
Perhaps I have been misunderstood; I have been suggesting to change our
interpretation but to keep all existing tuples as they are. I am very
much aware of this issue.
*Very recently introduced* names can be adjusted to correct technical errors. For
example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as there is no GNU libc
port to Windows (see below); config.guess should not produce it and config.sub should not
convert anything into it. But if the patch that had introduced this mistake were more
than a few months old, we would be stuck with it, permanently.
Fortunately, this particular error was caught relatively quickly.
2. We should avoid adding any more information to canonical system names. Things like
the availability of Bourne shell, which of the several available implementations of
"init" (Unix PID 1) is in use, etc. should be handled with Autoconf-style
feature probes. Yes, it's difficult to run ./configure if you don't have a Bourne shell,
but I suspect most of the environments where that's an issue are used primarily as
cross-compilation targets rather than native-build hosts.
A platform without a Bourne shell is (as far as the GNU build system is
concerned) only usable as a cross-compilation target. Issues like shell
availability or choice of init(8) are a reasonable use for the "OS"
field, where an operating system tag is essentially a gestalt summary of
the target environment. The combinatorial explosion that would cause in
modern use is a different issue.
My suggested place to draw the line is, if you reasonably need a cross-compiler
targeting A to be different from a cross-compiler targeting B, then the
distinction between A and B can go in the canonical system name; if you don't,
then it shouldn't. This should be pretty close to existing practice (because
that's exactly how GCC uses CSNs, via ./configure --target) and should give us
concrete reasons to make a decision in each case.
Agreed that calling the third field "operating system" is a holdover
from a past where that actually mattered and operating systems were
proprietary monoliths. This also provides a good first guess at a limit
for what environment details should be in an CSN and what should not:
if the same cross-compiler targets both environments, they should have
the same CSN. However, a system with both GNU libc and Musl libc could
possibly use GCC's multilib facility instead of separate instances of
the compiler, so multilib targets probably need some form of disambiguation.
[...]
3. I like the idea of a "--parseable" option to config.sub/guess that make them
spit out something easier to parse. My preferred syntax would be a newline- or
semicolon-separated sequence of Bourne shell assignment statements, because, if there was
also a way to ask config.sub/guess to add a prefix to every variable name, that would let
Autoconf scripts process the output with `eval` rather than the nasty bit of parser goo
we have now (_AC_CANONICAL_SPLIT,
https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/general.m4#n1987). It
would need to be something like
$ ./config.guess
aarch64-unknown-linux-gnu
$ ./config.guess --prefix=host --parseable
host_cpu=aarch64
host_vendor=unknown
host_os=linux-gnu
It would be OK to introduce additional key=value pairs at that point (kernel,
abi, etc), but the existing three (cpu, vendor, os) need to keep emitting
exactly what they do now.
I was proposing adding a --parse option only to config.sub to avoid code
duplication. I also do not think of this as a "parseable" form but as a
pre-parsed form. I disagree with using --prefix here when --parse could
easily accept that same prefix as its optional argument, especially
since config.{sub,guess} are in such close proximity to configure, which
uses --prefix for a very different purpose.
4. We should deemphasize and possibly explicitly deprecate the vendor component
of a CSN. Nowadays, in my experience, it just confuses people.
The problem is that VENDOR was actually important in the dim past and
could still be useful in some contexts today (I expect it to be
particularly helpful with vendor-specific extensions to RISC-V, for
example.) but I agree that we should probably settle on a "neutral"
VENDOR tag for CSNs where it really does not matter. I suggest
"generic" for that case, but I am not completely certain how to
distinguish between "generic" and "unknown". To start a discussion, I
suggest that "unknown" is just that, while "generic" is an active
statement that CPU-generic-* is dependent only on the CPU architecture.
-- Jacob