Re: Rethinking configuration tuples

Jacob Bachmeyer Sun, 10 Sep 2023 18:50:33 -0700

Zack Weinberg wrote:

I haven't been following this long discussion very closely but I do have some opinions 
(with my "de facto autoconf maintainer" hat on):


1. As a general rule, it is not safe to change the canonicalization (i.e. the config.sub 
output) of an existing system name, *at all*; in many cases, not even if it is wrong. I 
find that people working on GNU tools often don't realize just how broadly used these 
names are. Changing the canonicalization of "CPU-VENDOR-mingw32", for example, 
is very likely to break things like Ansible playbooks and Travis-style CI build matrices 
-- one-off files that exist by the tens of thousands and there's no practical way to 
*enumerate* them all, let alone get them all changed to satisfy a GNU-internal desire for 
a more consistent naming convention.

Perhaps I have been misunderstood; I have been suggesting to change ourinterpretation but to keep all existing tuples as they are. I am verymuch aware of this issue.

*Very recently introduced* names can be adjusted to correct technical errors.  For 
example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as there is no GNU libc 
port to Windows (see below); config.guess should not produce it and config.sub should not 
convert anything into it.  But if the patch that had introduced this mistake were more 
than a few months old, we would be stuck with it, permanently.


Fortunately, this particular error was caught relatively quickly.

2. We should avoid adding any more information to canonical system names.  Things like 
the availability of Bourne shell, which of the several available implementations of 
"init" (Unix PID 1) is in use, etc. should be handled with Autoconf-style 
feature probes.  Yes, it's difficult to run ./configure if you don't have a Bourne shell, 
but I suspect most of the environments where that's an issue are used primarily as 
cross-compilation targets rather than native-build hosts.

A platform without a Bourne shell is (as far as the GNU build system isconcerned) only usable as a cross-compilation target. Issues like shellavailability or choice of init(8) are a reasonable use for the "OS"field, where an operating system tag is essentially a gestalt summary ofthe target environment. The combinatorial explosion that would cause inmodern use is a different issue.

My suggested place to draw the line is, if you reasonably need a cross-compiler 
targeting A to be different from a cross-compiler targeting B, then the 
distinction between A and B can go in the canonical system name; if you don't, 
then it shouldn't.  This should be pretty close to existing practice (because 
that's exactly how GCC uses CSNs, via ./configure --target) and should give us 
concrete reasons to make a decision in each case.

Agreed that calling the third field "operating system" is a holdoverfrom a past where that actually mattered and operating systems wereproprietary monoliths. This also provides a good first guess at a limitfor what environment details should be in an CSN and what should not:if the same cross-compiler targets both environments, they should havethe same CSN. However, a system with both GNU libc and Musl libc couldpossibly use GCC's multilib facility instead of separate instances ofthe compiler, so multilib targets probably need some form of disambiguation.

[...]

3. I like the idea of a "--parseable" option to config.sub/guess that make them 
spit out something easier to parse.  My preferred syntax would be a newline- or 
semicolon-separated sequence of Bourne shell assignment statements, because, if there was 
also a way to ask config.sub/guess to add a prefix to every variable name, that would let 
Autoconf scripts process the output with `eval` rather than the nasty bit of parser goo 
we have now (_AC_CANONICAL_SPLIT, 
https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/general.m4#n1987).  It 
would need to be something like

$ ./config.guess
aarch64-unknown-linux-gnu
$ ./config.guess --prefix=host --parseable
host_cpu=aarch64
host_vendor=unknown
host_os=linux-gnu

It would be OK to introduce additional key=value pairs at that point (kernel, 
abi, etc), but the existing three (cpu, vendor, os) need to keep emitting 
exactly what they do now.

I was proposing adding a --parse option only to config.sub to avoid codeduplication. I also do not think of this as a "parseable" form but as apre-parsed form. I disagree with using --prefix here when --parse couldeasily accept that same prefix as its optional argument, especiallysince config.{sub,guess} are in such close proximity to configure, whichuses --prefix for a very different purpose.

4. We should deemphasize and possibly explicitly deprecate the vendor component 
of a CSN.  Nowadays, in my experience, it just confuses people.

The problem is that VENDOR was actually important in the dim past andcould still be useful in some contexts today (I expect it to beparticularly helpful with vendor-specific extensions to RISC-V, forexample.) but I agree that we should probably settle on a "neutral"VENDOR tag for CSNs where it really does not matter. I suggest"generic" for that case, but I am not completely certain how todistinguish between "generic" and "unknown". To start a discussion, Isuggest that "unknown" is just that, while "generic" is an activestatement that CPU-generic-* is dependent only on the CPU architecture.



-- Jacob

Re: Rethinking configuration tuples

Reply via email to