John Ericson wrote:
On 8/27/23 01:06, Jacob Bachmeyer wrote:
[...]
Ah sorry, I shouldn't have made reference to JSON at all --- what I
really was getting at is the /abstract syntax/. In particular,
rather than having an abstract syntax of "list of strings" (parsing
today's concrete syntax by breaking on dash), where the meaning of
each string is ambiguous / context-sensative, we have of "keys
mapped to enumerations", i.e. one always knows the meaning of each
component explicitly / without inspecting it or its context.
JSON or your flat list in canonical ordering (where I assume we are
careful to never skip a type of component) are both valid concrete
syntaxes that can be parsed / printed from this abstract syntax.
JSON is far too complicated to use here, except possibly as a
"pre-parsed" form that config.sub could output on request for
programs that want a structured form instead of parsing the tuple
themselves. But for that case, why use JSON instead of a trivial
multi-line key=value format?
Hypothetical Example:
$ config.sub --parse x86_64-linux-gnu
cpu=x86_64
vendor=pc
kernel=linux
os=gnu
$
Note that this example both canonicalizes and parses.
Yes that looks great to me. This shares the abstract syntax with what
I had in mind, and anything that understands JSON can easily convert
back and forth between the two.
I argue for "duck-typing" here from the user's perspective: if and
only if the system in all meaningful ways appears to be the GNU
system, there should be a *-gnu* somewhere in the configuration tuple.
I am OK with duck-typing, but what is "all meaningful ways"? Sure,
POSIX is meaningful, the exact output of uname is not, etc. but where
do we draw the line?
That is a question for which I do not currently have a certain answer. :/
This is also the framework in which *-*-linux-gnu-musl makes sense
for a system that uses Musl libc but is otherwise a GNU/Linux system.
Right but again where do we draw the line? For example, can one use
systemd and its large entourage of intertwined software, or must one
use GNU Shepherd or System V init?
In the case of *-*-linux-gnu and *-*-linux-gnu-musl, the difference is
the C runtime library (GNU libc vs. Musl libc) such that shared objects
linked for one ABI are not compatible with the other. If Musl libc were
exactly 100% binary compatible with GNU libc, then there would be no
*-*-linux-gnu-musl platform, since it would be indistinguishable from
*-*-linux-gnu. The choice of system service management is orthogonal to
this, since it has minimal impact on user programs. (Unless systemd
gets even more outrageously invasive...)
[...]
I still oppose JSON because it is way too verbose for this:
configuration tuples need to be both expressive and simple enough
to type at a shell prompt as arguments to configure. Using JSON by
default would also be a very nasty "flag day" that would break all
existing programs that use config.sub. Perhaps config.sub could
accept an --as=json parameter for JSON output?
Yes exactly, JSON is a no-go for prefixed binaries, but probably
better for things like Autoconf which needs to parse the output of
config.sub either way.
No, because Autoconf uses the shell and JSON is a [*profanity
elided*] to parse using shell constructs. A flat list of
hyphen-delimited tags is almost ideal for the parsing that configure
needs to do. In fact, with a few restrictions (met by using
canonical ordering) this is what configure /already/ parses.
Oops, yes I was being sloppy confusing concrete and abstract syntax
again. Sorry!
I think while that for something like Meson or CMake JSON could be
better, for Autoconf your ${key}=${value}\n format is perfect. Easy to
parse and fully disambiguated.
And of course, GNU config should care more about Autoconf than Meson
or CMake.
Except configure usually does not need a "fully disambiguated"
form---the canonical form produced by config.sub is fine, since
configure is usually matching against the full tuple using shell case
patterns. The flat list with a defined order is optimal for this
strategy, since it allows to easily check for the presence of any tag or
combination of tags.
Note that config.sub is itself a shell script, and handling JSON in
shell is a giant pain. The most we could reasonably do is what
config.sub already does: determine each component as a separate
variable and then output that by substituting text into a template.
Yes I agree config.sub in its current form (must be highly portable
across different Bourne-shell derivatives) has no hope of parsing
JSON. It could output it or it could also output your
${key}=${value}\n format, and it could also consume your format. Your
format is ideal for it!
Adding a prefix to each key in the key=value format is trivial and would
further help shell scripts that want to "parse by eval" but configure
itself tests predicates rather than caring exactly what part of the
configuration tuple means what. Put another way, configure is usually
looking for a yes/no answer, so a pre-parsed form is less useful than a
single string that can be used for pattern matches.
The hyphen-separated form is unambiguous as it stands, or close
enough to be resolvable with minimal effort. With a dictionary of
allowed element values, it is unambiguous, even if some elements are
omitted; resolving ambiguous forms to unambiguous forms using such a
dictionary is what config.sub /does/.
Yes config.sub should continue to take ambiguous format(s), because
normalizing them is its purpose. But see
08ede0dcc1bcfd8b77a80605d4de89c768cab2c7 where I also made sure
config.sub was idempotent, correcting some longstanding bugs in the
process!
I think ensuring it is idempotent both on the ambiguous dash-separated
format, /and/ on the unambiguous key-value format, would make for a
more exacting additional impotency test suite.
There is no reasonable way to feed the key=value format /into/
config.sub: configuration tuples are hyphen-delimited lists. Producing
key=value format using config.sub's knowledge of valid tuples might be
reasonable for /other/ systems to use instead of needing their own parsers.
Thank you; as I mentioned above, the goal is to best support
heterogeneous multi-arch systems, but recognizing a tension here.
For configure, the configuration tuple should not contain information
that can be determined by testing, but for storing multiple binary
sets, ABIs do need to be part of the name, even if they can be
determined by configure tests.
Agreed configure tests are better for the "long tail" of other
attributes. (IMO if we were to define "operating system", it would be
something like the "limit" of all configure checks.)
But a big part of my "kernel-libc" thinking (and I think also
Connor's) is that kernel + libc mostly determines on OS for toolchain
purposes. E.g. even if there exists both GNU-like and non-GNU
linux-musl systems, they can share the same prefixed binaries, so
there is no need to distinguish them at the config level.
Do you have a counter-example where a sameKernel-OS_0-sameLibc and a
sameKernel-OS_1-sameLibc would need different prefixed binaries, and
thus the inclusion of OS in addition to kernel and libc is necessary
to avoid binary name collisions?
I am not entirely certain why, but I know that there is some reason we
call the common GNU/Linux systems *-*-linux-gnu instead of *-*-linux.
[...]
I called the fifth field "LIBCABI" because it can be a libc name or
an ABI name; in practice the two are usually closely related. Some
existing tuples place a libc name in that slot, while others use a
more generic ABI or file format name, such as "elf" in your example.
For it to be a source of confusion, there would need to be a libc
that supports multiple ABIs, and you would simply use the ABI names
in that case.
Perhaps you know of examples of existing ones out in the wild that I
am not aware of that need to include kernel, OS, and libc? Do share if
you do!
The major example that immediately comes to mind would be a GNU/Linux
distribution using Musl libc. But that comes back to why *-*-linux-gnu
exists in the first place...
-- Jacob