Re: Rethinking configuration tuples

Jacob Bachmeyer Sun, 27 Aug 2023 20:59:31 -0700

John Ericson wrote:

On 8/27/23 01:06, Jacob Bachmeyer wrote:
[...]
Ah sorry, I shouldn't have made reference to JSON at all --- what Ireally was getting at is the /abstract syntax/. In particular,rather than having an abstract syntax of "list of strings" (parsingtoday's concrete syntax by breaking on dash), where the meaning ofeach string is ambiguous / context-sensative, we have of "keysmapped to enumerations", i.e. one always knows the meaning of eachcomponent explicitly / without inspecting it or its context.
JSON or your flat list in canonical ordering (where I assume we arecareful to never skip a type of component) are both valid concretesyntaxes that can be parsed / printed from this abstract syntax.
JSON is far too complicated to use here, except possibly as a"pre-parsed" form that config.sub could output on request forprograms that want a structured form instead of parsing the tuplethemselves. But for that case, why use JSON instead of a trivialmulti-line key=value format?
Hypothetical Example:
$ config.sub --parse x86_64-linux-gnu
cpu=x86_64
vendor=pc
kernel=linux
os=gnu
$

Note that this example both canonicalizes and parses.
Yes that looks great to me. This shares the abstract syntax with whatI had in mind, and anything that understands JSON can easily convertback and forth between the two.
I argue for "duck-typing" here from the user's perspective: if andonly if the system in all meaningful ways appears to be the GNUsystem, there should be a *-gnu* somewhere in the configuration tuple.
I am OK with duck-typing, but what is "all meaningful ways"? Sure,POSIX is meaningful, the exact output of uname is not, etc. but wheredo we draw the line?


That is a question for which I do not currently have a certain answer.  :/

This is also the framework in which *-*-linux-gnu-musl makes sensefor a system that uses Musl libc but is otherwise a GNU/Linux system.
Right but again where do we draw the line? For example, can one usesystemd and its large entourage of intertwined software, or must oneuse GNU Shepherd or System V init?

In the case of *-*-linux-gnu and *-*-linux-gnu-musl, the difference isthe C runtime library (GNU libc vs. Musl libc) such that shared objectslinked for one ABI are not compatible with the other. If Musl libc wereexactly 100% binary compatible with GNU libc, then there would be no*-*-linux-gnu-musl platform, since it would be indistinguishable from*-*-linux-gnu. The choice of system service management is orthogonal tothis, since it has minimal impact on user programs. (Unless systemdgets even more outrageously invasive...)

[...]
I still oppose JSON because it is way too verbose for this:configuration tuples need to be both expressive and simple enoughto type at a shell prompt as arguments to configure. Using JSON bydefault would also be a very nasty "flag day" that would break allexisting programs that use config.sub. Perhaps config.sub couldaccept an --as=json parameter for JSON output?
Yes exactly, JSON is a no-go for prefixed binaries, but probablybetter for things like Autoconf which needs to parse the output ofconfig.sub either way.
No, because Autoconf uses the shell and JSON is a [*profanityelided*] to parse using shell constructs. A flat list ofhyphen-delimited tags is almost ideal for the parsing that configureneeds to do. In fact, with a few restrictions (met by usingcanonical ordering) this is what configure /already/ parses.
Oops, yes I was being sloppy confusing concrete and abstract syntaxagain. Sorry!
I think while that for something like Meson or CMake JSON could bebetter, for Autoconf your ${key}=${value}\n format is perfect. Easy toparse and fully disambiguated.
And of course, GNU config should care more about Autoconf than Mesonor CMake.

Except configure usually does not need a "fully disambiguated"form---the canonical form produced by config.sub is fine, sinceconfigure is usually matching against the full tuple using shell casepatterns. The flat list with a defined order is optimal for thisstrategy, since it allows to easily check for the presence of any tag orcombination of tags.

Note that config.sub is itself a shell script, and handling JSON inshell is a giant pain. The most we could reasonably do is whatconfig.sub already does: determine each component as a separatevariable and then output that by substituting text into a template.
Yes I agree config.sub in its current form (must be highly portableacross different Bourne-shell derivatives) has no hope of parsingJSON. It could output it or it could also output your${key}=${value}\n format, and it could also consume your format. Yourformat is ideal for it!

Adding a prefix to each key in the key=value format is trivial and wouldfurther help shell scripts that want to "parse by eval" but configureitself tests predicates rather than caring exactly what part of theconfiguration tuple means what. Put another way, configure is usuallylooking for a yes/no answer, so a pre-parsed form is less useful than asingle string that can be used for pattern matches.

The hyphen-separated form is unambiguous as it stands, or closeenough to be resolvable with minimal effort. With a dictionary ofallowed element values, it is unambiguous, even if some elements areomitted; resolving ambiguous forms to unambiguous forms using such adictionary is what config.sub /does/.
Yes config.sub should continue to take ambiguous format(s), becausenormalizing them is its purpose. But see08ede0dcc1bcfd8b77a80605d4de89c768cab2c7 where I also made sureconfig.sub was idempotent, correcting some longstanding bugs in theprocess!
I think ensuring it is idempotent both on the ambiguous dash-separatedformat, /and/ on the unambiguous key-value format, would make for amore exacting additional impotency test suite.

There is no reasonable way to feed the key=value format /into/config.sub: configuration tuples are hyphen-delimited lists. Producingkey=value format using config.sub's knowledge of valid tuples might bereasonable for /other/ systems to use instead of needing their own parsers.

Thank you; as I mentioned above, the goal is to best supportheterogeneous multi-arch systems, but recognizing a tension here.For configure, the configuration tuple should not contain informationthat can be determined by testing, but for storing multiple binarysets, ABIs do need to be part of the name, even if they can bedetermined by configure tests.
Agreed configure tests are better for the "long tail" of otherattributes. (IMO if we were to define "operating system", it would besomething like the "limit" of all configure checks.)
But a big part of my "kernel-libc" thinking (and I think alsoConnor's) is that kernel + libc mostly determines on OS for toolchainpurposes. E.g. even if there exists both GNU-like and non-GNUlinux-musl systems, they can share the same prefixed binaries, sothere is no need to distinguish them at the config level.
Do you have a counter-example where a sameKernel-OS_0-sameLibc and asameKernel-OS_1-sameLibc would need different prefixed binaries, andthus the inclusion of OS in addition to kernel and libc is necessaryto avoid binary name collisions?

I am not entirely certain why, but I know that there is some reason wecall the common GNU/Linux systems *-*-linux-gnu instead of *-*-linux.

[...]
I called the fifth field "LIBCABI" because it can be a libc name oran ABI name; in practice the two are usually closely related. Someexisting tuples place a libc name in that slot, while others use amore generic ABI or file format name, such as "elf" in your example.For it to be a source of confusion, there would need to be a libcthat supports multiple ABIs, and you would simply use the ABI namesin that case.
Perhaps you know of examples of existing ones out in the wild that Iam not aware of that need to include kernel, OS, and libc? Do share ifyou do!

The major example that immediately comes to mind would be a GNU/Linuxdistribution using Musl libc. But that comes back to why *-*-linux-gnuexists in the first place...



-- Jacob

Re: Rethinking configuration tuples

Reply via email to