Re: Rethinking configuration tuples

John Ericson Sun, 27 Aug 2023 08:15:17 -0700

On 8/27/23 01:06, Jacob Bachmeyer wrote:

As I understand the history, Linux was the first clearly Free kernelavailable. At the time, BSD still had a dark cloud hanging over itdue to its (distant) origins at AT&T; the BSD and AT&T UNIX codebaseswould not be legally recognized as separate until February 1994,although BSD had honestly (almost?) completely diverged from the AT&Tcodebase in June 1991 with Net/2. Mach was still proprietary; RMS was(or would later be) campaigning for its liberation, which would notoccur until some years later. It is worth noting that Linux wasoriginally a toy kernel, and it only attracted the effort it did andgrew like it did because it was basically the last missing piece forfully Free systems at the time.


Yes that is how I understand it too

Ah sorry, I shouldn't have made reference to JSON at all --- what Ireally was getting at is the /abstract syntax/. In particular, ratherthan having an abstract syntax of "list of strings" (parsing today'sconcrete syntax by breaking on dash), where the meaning of eachstring is ambiguous / context-sensative, we have of "keys mapped toenumerations", i.e. one always knows the meaning of each componentexplicitly / without inspecting it or its context.
JSON or your flat list in canonical ordering (where I assume we arecareful to never skip a type of component) are both valid concretesyntaxes that can be parsed / printed from this abstract syntax.
JSON is far too complicated to use here, except possibly as a"pre-parsed" form that config.sub could output on request for programsthat want a structured form instead of parsing the tuple themselves. But for that case, why use JSON instead of a trivial multi-linekey=value format?
Hypothetical Example:
$ config.sub --parse x86_64-linux-gnu
cpu=x86_64
vendor=pc
kernel=linux
os=gnu
$

Note that this example both canonicalizes and parses.

Yes that looks great to me. This shares the abstract syntax with what Ihad in mind, and anything that understands JSON can easily convert backand forth between the two.

I argue for "duck-typing" here from the user's perspective: if andonly if the system in all meaningful ways appears to be the GNUsystem, there should be a *-gnu* somewhere in the configuration tuple.

I am OK with duck-typing, but what is "all meaningful ways"? Sure, POSIXis meaningful, the exact output of uname is not, etc. but where do wedraw the line?

This is also the framework in which *-*-linux-gnu-musl makes sense fora system that uses Musl libc but is otherwise a GNU/Linux system.

Right but again where do we draw the line? For example, can one usesystemd and its large entourage of intertwined software, or must one useGNU Shepherd or System V init?

Effectively, a different libc is a different ABI.

Agreed, especially when the syscall interface isn't stable, like withmany non-Windows kernels.

My larger goal here is to smooth the way for multi-arch systems, with/usr/CPU-VENDOR-KERNEL-OS-ABI or so as the --prefix for binaries builtfor each architecture. This means that configuration tuples should bedetailed enough to allow the needed distinctions, but not so detailedas to themselves become an artificial incompatibility. In largernetworked environments, even KERNEL and OS could vary.


It's a great goal, and mine too! :)

Yeah whatever windows-something we settle on for MinGW, I promise myoffer still stands to try to get get LLVM to (a) accept it, and (b)steer people away from windows-gnu towards it.
Thanks.

No problem! :)

This is the major expectation that using *-*-windows-gnu for MinGWviolates: GNU implements POSIX and MinGW does not. Using *-mingnustill leaves considerable room for confusion in my view, which using*-mingw avoids.

That is fine with me. Agreed "mingnu" takes the proper noun and turns itback into a common noun phrase --- i.e. "minimal GNU" has many validinterpretations while "MinGW" avoids that be being a known quantity.

After that, I think we are close enough to convene a working groupfor a JSON/whatever explicit standard. And that would be amazing.
I still oppose JSON because it is way too verbose for this:configuration tuples need to be both expressive and simple enough totype at a shell prompt as arguments to configure. Using JSON bydefault would also be a very nasty "flag day" that would break allexisting programs that use config.sub. Perhaps config.sub couldaccept an --as=json parameter for JSON output?
Yes exactly, JSON is a no-go for prefixed binaries, but probablybetter for things like Autoconf which needs to parse the output ofconfig.sub either way.
No, because Autoconf uses the shell and JSON is a [*profanity elided*]to parse using shell constructs. A flat list of hyphen-delimited tagsis almost ideal for the parsing that configure needs to do. In fact,with a few restrictions (met by using canonical ordering) this is whatconfigure /already/ parses.

Oops, yes I was being sloppy confusing concrete and abstract syntaxagain. Sorry!

I think while that for something like Meson or CMake JSON could bebetter, for Autoconf your ${key}=${value}\n format is perfect. Easy toparse and fully disambiguated.

And of course, GNU config should care more about Autoconf than Meson orCMake.

Note that config.sub is itself a shell script, and handling JSON inshell is a giant pain. The most we could reasonably do is whatconfig.sub already does: determine each component as a separatevariable and then output that by substituting text into a template.

Yes I agree config.sub in its current form (must be highly portableacross different Bourne-shell derivatives) has no hope of parsing JSON.It could output it or it could also output your ${key}=${value}\nformat, and it could also consume your format. Your format is ideal for it!

The hyphen-separated form is unambiguous as it stands, or close enoughto be resolvable with minimal effort. With a dictionary of allowedelement values, it is unambiguous, even if some elements are omitted;resolving ambiguous forms to unambiguous forms using such a dictionaryis what config.sub /does/.

Yes config.sub should continue to take ambiguous format(s), becausenormalizing them is its purpose. But see08ede0dcc1bcfd8b77a80605d4de89c768cab2c7 where I also made sureconfig.sub was idempotent, correcting some longstanding bugs in the process!

I think ensuring it is idempotent both on the ambiguous dash-separatedformat, /and/ on the unambiguous key-value format, would make for a moreexacting additional impotency test suite.

Thank you; as I mentioned above, the goal is to best supportheterogeneous multi-arch systems, but recognizing a tension here. Forconfigure, the configuration tuple should not contain information thatcan be determined by testing, but for storing multiple binary sets,ABIs do need to be part of the name, even if they can be determined byconfigure tests.

Agreed configure tests are better for the "long tail" of otherattributes. (IMO if we were to define "operating system", it would besomething like the "limit" of all configure checks.)

But a big part of my "kernel-libc" thinking (and I think also Connor's)is that kernel + libc mostly determines on OS for toolchain purposes.E.g. even if there exists both GNU-like and non-GNU linux-musl systems,they can share the same prefixed binaries, so there is no need todistinguish them at the config level.

Do you have a counter-example where a sameKernel-OS_0-sameLibc and asameKernel-OS_1-sameLibc would need different prefixed binaries, andthus the inclusion of OS in addition to kernel and libc is necessary toavoid binary name collisions?

(FWIW, since I use NixOS, I can always distinguish toolchains byabsolute path. Guix has the same property. This might have lead me tomiss cases over the years where the prefix was not enough! :))

I called the fifth field "LIBCABI" because it can be a libc name or anABI name; in practice the two are usually closely related. Someexisting tuples place a libc name in that slot, while others use amore generic ABI or file format name, such as "elf" in your example. For it to be a source of confusion, there would need to be a libc thatsupports multiple ABIs, and you would simply use the ABI names in thatcase.

Perhaps you know of examples of existing ones out in the wild that I amnot aware of that need to include kernel, OS, and libc? Do share if you do!



John

Re: Rethinking configuration tuples

Reply via email to