Re: Rethinking configuration tuples

John Ericson Wed, 13 Sep 2023 21:28:59 -0700

Oops I had this email as draft and didn't hit send. The conversation hasmoved on since a bit, but I'll send it anyways.


John


------------

On 9/6/23 19:46, Jacob Bachmeyer wrote:

The problem is that system(3) probably /does/ exist in thatconfiguration, but such a system is only usable as a cross-compilationtarget, since building the GNU tools requires a shell.

Agree! This is exactly what I mean of "ops time" vs build time question:if we do have system(3) we don't know what shell it will be hooked up toin general (cross compilation is the general case).

I would say even if we are native compiling, and we find that the shellvia system(3) supports x y z, we shouldn't bake the results of aconfigure-time check for that in at build time --- the installed binarymight be copied to another system with the same libc but a differentshell, and then system(3) would do something else.

I think configs are mainly useful for the host and target systems, andthus should focus on information needed for them. Sounds like we agreethat "do we have a shell [that can do x y z]" is question that is fineto ask of the build platform, but not really appropriate to ask of thehost or target platform. Thus, things like "presence of shell" are notreally good to include in configs.

This is all to say that telling apart OSes (as opposed to merely libcs)seems too fraught to configs for me.

Maybe this could go in the config per the "arbitrary many components,finer distinctions to the right" "converging sequence" approach, butthen I would want this further to the right, e.g.aarch64-unknown-musl-noshell not aarch64-unknown-noshell-musl.
The idea of lacking a shell was intended as an example of somethingthat would make a Free system /very/ different from the GNU system,enough that it cannot be considered a GNU variant. In other words, aLinux-based system that is clearly /not/ GNU/Linux.

Agreed it is not GNU/Linux. But I think anything that uses glibc shoulduse "gnu" in the config because configs have evolved to be about libcmore than OS.

The choice of system service management is orthogonal to this,since it has minimal impact on user programs. (Unless systemdgets even more outrageously invasive...)
Agreed, just wanted to double check.
Of course, if systemd *does* get sufficiently outrageously invasive,we might need a *-*-linux-systemd-glibc tuple... (Since systemdgleefully makes extensive use of Linux-kernel-specific features, itcannot possibly be a standard on the GNU system, which supportsmultiple Free kernels.)
Yes I agree systemd probably can't be "bonafide GNU OS", but I takethe opposite conclusion that this is evidence for the "gnu" for glibcis more important than the "gnu" for "true GNU OS".
In this hypothetical (that needs to *stay* hypothetical) example,"systemd" has somehow become an "OS" distinct from the GNU system.

Yes.

Except configure usually does not need a "fully disambiguated"form---the canonical form produced by config.sub is fine, sinceconfigure is usually matching against the full tuple using shellcase patterns. The flat list with a defined order is optimal forthis strategy, since it allows to easily check for the presence ofany tag or combination of tags.
Shell case patterns can be a bit of a footgun. For example, acommon mistake is doing * instead of *-*.
If the allowed pattern elements are sufficiently unambiguous, thereis no mistake, since `*' matches text including `-'. In fact, whentesting n "is tag FOO present?" predicate `*-foo-* | *-foo' would becorrect. (I assume that a CPU type will remain required and willremain first in the list.)
Sorry I meant as part of a larger pattern. With things like *-stuff-*vs *-*-stuff-*-*, the extra dashes are needed to make sure "stuff"matches the right component, and even then it only works if one knowsthe exact number of components (which can be accomplished by *-*...and the ordering of patterns). It is quite subtle!
For the "converging sequence" model, omitting the extra dashes isimportant, since the number of tags prior to a "floating tag" canvary. (I would actually suggest making "gnu" such a floating tag inthis model, with an exact definition to be obtained from a laterdiscussion that would need to include RMS.)

Yeah I just mean the more components we have, the more a "sparse"representation is desirable, vs `something------another_thing`explicitly skipping things, because the latter is so annoying (needingto count). But that also creates ambiguities.

Allow the hypothetical --parse option to accept a PREFIX argumentand you are pretty much there:
$ ./config.sub --parse=host x86_64-linux-gnu
host=x86_64-pc-linux-gnu
host_cpu=x86_64
host_vendor=pc
host_kernel=linux
host_os=gnu
$
That form should be both easily parsed by other tools and suitablefor `eval` in shell scripts.
Yup! We're in agreement.
I agree testing is more robust, but for better or worse I still dosee scripts using those host_* variables mentioned above. (Testingis possible but requires more care to get right forcross-compilation, for one.)
In this case the test is `case $host in ... esac`.
I would say it is better to case on (combinations of `host_*`variables than `$host`, because then knows exactly what componentsare being cased upon; there is no ambiguity. I think one shouldbasically only use `host` as a block-box identifier (e.g. prefixingbinaries) and and other time one would like to use `host` they shoulduse the `host_*` variables instead.
This comes back to the "converging sequence" model issue: what to dowith the "floating tags" that are not in fixed fields?

I think it is OK if the key=value format is more expressive. In a"design by committee" approach, the fact that it is new (nohistory-induced tech debt from anyone) and fully explicit will make iteasy to avoid bikeshedding and other disagreements.

We could also think of such components as stuff whose representation indashed form has "yet to be standardized" --- the more expressive formatcan queue up a todo list of sorts.

The problem is still getting it /into/ config.sub: config.subexpects a single command-line argument, while pre-parsed form spansa few lines.
I don't think that is so hard. config.sub accepts --gnu-long-argsalready )without confusing them as configs) so we can simply dosomething like
./config.sub --pre-categorized cpu=x86_64 vendor=pc kernel=linux os=gnu

and then there is no confusing the two forms of input.
If CPU-VENDOR-KERNEL-OS elements are always acceptable in that orderfor any valid configuration, what is the point of generating thekey=value arguments instead of simply substituting into a fixed template?

Well, take MinGW for example. We cannot normalize x86_64-w64-mingw32away --- there are some 3-component normal forms we are stuck withbecause of backwards compatibility.

The new format is a chance to rethink those things without breakingcompatibility: we can always normalize to "cpu=x86_64 vendor=w64kernel=windows libc/abi=mingw" or similar when outputting in "structuredmode".

[...]
I am not entirely certain why, but I know that there is somereason we call the common GNU/Linux systems *-*-linux-gnu insteadof *-*-linux.
To be honest, I think this is basically the "call it GNU/Linux notLinux" controversy --- i.e. at the time it was done for social nottechnical reasons. I don't mind, since now that we have multiplelibcs there /is/ a technical reason to distinguish. But thiscircles back to my hunch that Kernel (syscall interface) + libc(ABI) determines OS uniquely enough for config.sub's purposes.
That is possible, but still a valid reason for the GNU Project tostay with that angle.
Yeah I have no problem with the term GNU/Linux, I just don't think"OS" is useful for config.sub. "Linux + GNU libc" for config.sub;"GNU/Linux" for humans/prose.
To some extent, os=gnu is a promise to the user that gets written intocommand names, but I am not personally certain what exactly thatpromise is. The GNU system is extremely flexible.

Right, that is where I think it is practical to yield to evolutionelsewhere and interpret "linux-gnu" as meaning "kernel=linux libc=gnu"not "kernel=linux os=gnu".

Erm I mean not an extant system that would use such a config underyour system, but an extant config (not necessarily a GNU one, couldbe an LLVM, Rust, or something else one) for such a system. Inother words, I am asking whether there was a case where someoneelse evidently decided that kernel+libc was not enough info and OSwas also needed to further disambiguate.
I do not know of any off the top of my head.
OK. For the record, I wouldn't focus on this "OS-libc" stuff so much,except I suspect it would get in the way of the sort of grandreconciliation between us, LLVM, Rust, etc. that needs to happen. Ifthe "OS is actually libc" way it's ended up elsewhere is acceptableto GNU Config, as I hope it can be, and how downstream often uses GNUConfig in practice, that gets us much closer to consensus.
In this interpretation, *-*-windows-gnu for MinGW is /still/ blatantlywrong, since MinGW does not use glibc.


Yes! Here's what I would say to LLVM:

> OK we will go with you in recognizing "gnu" means "abi=gnu" not"os=gnu", but in return you deprecate "windows-gnu" because MinGWdoesn't use GNU libc.

It's IMO a nice and fair compromise: both sides share the burden/hassleof a deprecation cycle (rather than making one side to all the work),and we end up with much clearer definitions than we started with.


John

Re: Rethinking configuration tuples

Reply via email to