I haven't been following this long discussion very closely but I do have some opinions (with my "de facto autoconf maintainer" hat on):
1. As a general rule, it is not safe to change the canonicalization (i.e. the config.sub output) of an existing system name, *at all*; in many cases, not even if it is wrong. I find that people working on GNU tools often don't realize just how broadly used these names are. Changing the canonicalization of "CPU-VENDOR-mingw32", for example, is very likely to break things like Ansible playbooks and Travis-style CI build matrices -- one-off files that exist by the tens of thousands and there's no practical way to *enumerate* them all, let alone get them all changed to satisfy a GNU-internal desire for a more consistent naming convention. *Very recently introduced* names can be adjusted to correct technical errors. For example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as there is no GNU libc port to Windows (see below); config.guess should not produce it and config.sub should not convert anything into it. But if the patch that had introduced this mistake were more than a few months old, we would be stuck with it, permanently. 2. We should avoid adding any more information to canonical system names. Things like the availability of Bourne shell, which of the several available implementations of "init" (Unix PID 1) is in use, etc. should be handled with Autoconf-style feature probes. Yes, it's difficult to run ./configure if you don't have a Bourne shell, but I suspect most of the environments where that's an issue are used primarily as cross-compilation targets rather than native-build hosts. My suggested place to draw the line is, if you reasonably need a cross-compiler targeting A to be different from a cross-compiler targeting B, then the distinction between A and B can go in the canonical system name; if you don't, then it shouldn't. This should be pretty close to existing practice (because that's exactly how GCC uses CSNs, via ./configure --target) and should give us concrete reasons to make a decision in each case. For example, this rule says that the combination of Linux kernel with musl libc should be identified as "CPU-VENDOR-linux-musl", not "CPU-VENDOR-linux-gnu-musl", regardless of whether the overall system uses other GNU components. This is because the presence or absence of GNU libc *does* affect cross-compilation of C programs, but the presence or absence of other GNU software doesn't. [Note: I don't know whether RMS has said anything about this, and if he has, I don't care.] A compiled language *other than* the C family might, in the future, want us to make a distinction between cross-compilation targets that existing CSNs do not capture, but we can worry about that when it actually happens. 3. I like the idea of a "--parseable" option to config.sub/guess that make them spit out something easier to parse. My preferred syntax would be a newline- or semicolon-separated sequence of Bourne shell assignment statements, because, if there was also a way to ask config.sub/guess to add a prefix to every variable name, that would let Autoconf scripts process the output with `eval` rather than the nasty bit of parser goo we have now (_AC_CANONICAL_SPLIT, https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/general.m4#n1987). It would need to be something like $ ./config.guess aarch64-unknown-linux-gnu $ ./config.guess --prefix=host --parseable host_cpu=aarch64 host_vendor=unknown host_os=linux-gnu It would be OK to introduce additional key=value pairs at that point (kernel, abi, etc), but the existing three (cpu, vendor, os) need to keep emitting exactly what they do now. 4. We should deemphasize and possibly explicitly deprecate the vendor component of a CSN. Nowadays, in my experience, it just confuses people. zw