Re: Rethinking configuration tuples

Zack Weinberg Sun, 10 Sep 2023 17:53:28 -0700

I haven't been following this long discussion very closely but I do have some 
opinions (with my "de facto autoconf maintainer" hat on):


1. As a general rule, it is not safe to change the canonicalization (i.e. the 
config.sub output) of an existing system name, *at all*; in many cases, not 
even if it is wrong. I find that people working on GNU tools often don't 
realize just how broadly used these names are. Changing the canonicalization of 
"CPU-VENDOR-mingw32", for example, is very likely to break things like Ansible 
playbooks and Travis-style CI build matrices -- one-off files that exist by the 
tens of thousands and there's no practical way to *enumerate* them all, let 
alone get them all changed to satisfy a GNU-internal desire for a more 
consistent naming convention.

*Very recently introduced* names can be adjusted to correct technical errors.  
For example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as there is no GNU 
libc port to Windows (see below); config.guess should not produce it and 
config.sub should not convert anything into it.  But if the patch that had 
introduced this mistake were more than a few months old, we would be stuck with 
it, permanently.

2. We should avoid adding any more information to canonical system names.  
Things like the availability of Bourne shell, which of the several available 
implementations of "init" (Unix PID 1) is in use, etc. should be handled with 
Autoconf-style feature probes.  Yes, it's difficult to run ./configure if you 
don't have a Bourne shell, but I suspect most of the environments where that's 
an issue are used primarily as cross-compilation targets rather than 
native-build hosts.

My suggested place to draw the line is, if you reasonably need a cross-compiler 
targeting A to be different from a cross-compiler targeting B, then the 
distinction between A and B can go in the canonical system name; if you don't, 
then it shouldn't.  This should be pretty close to existing practice (because 
that's exactly how GCC uses CSNs, via ./configure --target) and should give us 
concrete reasons to make a decision in each case.

For example, this rule says that the combination of Linux kernel with musl libc 
should be identified as "CPU-VENDOR-linux-musl", not 
"CPU-VENDOR-linux-gnu-musl", regardless of whether the overall system uses 
other GNU components.  This is because the presence or absence of GNU libc 
*does* affect cross-compilation of C programs, but the presence or absence of 
other GNU software doesn't.  [Note: I don't know whether RMS has said anything 
about this, and if he has, I don't care.]

A compiled language *other than* the C family might, in the future, want us to 
make a distinction between cross-compilation targets that existing CSNs do not 
capture, but we can worry about that when it actually happens.

3. I like the idea of a "--parseable" option to config.sub/guess that make them 
spit out something easier to parse.  My preferred syntax would be a newline- or 
semicolon-separated sequence of Bourne shell assignment statements, because, if 
there was also a way to ask config.sub/guess to add a prefix to every variable 
name, that would let Autoconf scripts process the output with `eval` rather 
than the nasty bit of parser goo we have now (_AC_CANONICAL_SPLIT, 
https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/general.m4#n1987).
  It would need to be something like

$ ./config.guess
aarch64-unknown-linux-gnu
$ ./config.guess --prefix=host --parseable
host_cpu=aarch64
host_vendor=unknown
host_os=linux-gnu

It would be OK to introduce additional key=value pairs at that point (kernel, 
abi, etc), but the existing three (cpu, vendor, os) need to keep emitting 
exactly what they do now.

4. We should deemphasize and possibly explicitly deprecate the vendor component 
of a CSN.  Nowadays, in my experience, it just confuses people.

zw

Re: Rethinking configuration tuples

Reply via email to