Zack Weinberg wrote:
I haven't been following this long discussion very closely but I do have some opinions 
(with my "de facto autoconf maintainer" hat on):

1. As a general rule, it is not safe to change the canonicalization (i.e. the config.sub 
output) of an existing system name, *at all*; in many cases, not even if it is wrong. I 
find that people working on GNU tools often don't realize just how broadly used these 
names are. Changing the canonicalization of "CPU-VENDOR-mingw32", for example, 
is very likely to break things like Ansible playbooks and Travis-style CI build matrices 
-- one-off files that exist by the tens of thousands and there's no practical way to 
*enumerate* them all, let alone get them all changed to satisfy a GNU-internal desire for 
a more consistent naming convention.

Perhaps I have been misunderstood; I have been suggesting to change our interpretation but to keep all existing tuples as they are. I am very much aware of this issue.

*Very recently introduced* names can be adjusted to correct technical errors.  For 
example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as there is no GNU libc 
port to Windows (see below); config.guess should not produce it and config.sub should not 
convert anything into it.  But if the patch that had introduced this mistake were more 
than a few months old, we would be stuck with it, permanently.

Fortunately, this particular error was caught relatively quickly.

2. We should avoid adding any more information to canonical system names.  Things like 
the availability of Bourne shell, which of the several available implementations of 
"init" (Unix PID 1) is in use, etc. should be handled with Autoconf-style 
feature probes.  Yes, it's difficult to run ./configure if you don't have a Bourne shell, 
but I suspect most of the environments where that's an issue are used primarily as 
cross-compilation targets rather than native-build hosts.

A platform without a Bourne shell is (as far as the GNU build system is concerned) only usable as a cross-compilation target. Issues like shell availability or choice of init(8) are a reasonable use for the "OS" field, where an operating system tag is essentially a gestalt summary of the target environment. The combinatorial explosion that would cause in modern use is a different issue.

My suggested place to draw the line is, if you reasonably need a cross-compiler 
targeting A to be different from a cross-compiler targeting B, then the 
distinction between A and B can go in the canonical system name; if you don't, 
then it shouldn't.  This should be pretty close to existing practice (because 
that's exactly how GCC uses CSNs, via ./configure --target) and should give us 
concrete reasons to make a decision in each case.

Agreed that calling the third field "operating system" is a holdover from a past where that actually mattered and operating systems were proprietary monoliths. This also provides a good first guess at a limit for what environment details should be in an CSN and what should not: if the same cross-compiler targets both environments, they should have the same CSN. However, a system with both GNU libc and Musl libc could possibly use GCC's multilib facility instead of separate instances of the compiler, so multilib targets probably need some form of disambiguation.

[...]

3. I like the idea of a "--parseable" option to config.sub/guess that make them 
spit out something easier to parse.  My preferred syntax would be a newline- or 
semicolon-separated sequence of Bourne shell assignment statements, because, if there was 
also a way to ask config.sub/guess to add a prefix to every variable name, that would let 
Autoconf scripts process the output with `eval` rather than the nasty bit of parser goo 
we have now (_AC_CANONICAL_SPLIT, 
https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/general.m4#n1987).  It 
would need to be something like

$ ./config.guess
aarch64-unknown-linux-gnu
$ ./config.guess --prefix=host --parseable
host_cpu=aarch64
host_vendor=unknown
host_os=linux-gnu

It would be OK to introduce additional key=value pairs at that point (kernel, 
abi, etc), but the existing three (cpu, vendor, os) need to keep emitting 
exactly what they do now.

I was proposing adding a --parse option only to config.sub to avoid code duplication. I also do not think of this as a "parseable" form but as a pre-parsed form. I disagree with using --prefix here when --parse could easily accept that same prefix as its optional argument, especially since config.{sub,guess} are in such close proximity to configure, which uses --prefix for a very different purpose.

4. We should deemphasize and possibly explicitly deprecate the vendor component 
of a CSN.  Nowadays, in my experience, it just confuses people.
The problem is that VENDOR was actually important in the dim past and could still be useful in some contexts today (I expect it to be particularly helpful with vendor-specific extensions to RISC-V, for example.) but I agree that we should probably settle on a "neutral" VENDOR tag for CSNs where it really does not matter. I suggest "generic" for that case, but I am not completely certain how to distinguish between "generic" and "unknown". To start a discussion, I suggest that "unknown" is just that, while "generic" is an active statement that CPU-generic-* is dependent only on the CPU architecture.


-- Jacob

Reply via email to