Re: Rethinking configuration tuples

John Ericson Sun, 27 Aug 2023 21:29:33 -0700

On 8/27/23 23:59, Jacob Bachmeyer wrote:
>> I am OK with duck-typing, but what is "all meaningful ways"? Sure, POSIX is 
>> meaningful, the exact output of uname is not, etc. but where do we draw the 
>> line?
> That is a question for which I do not currently have a certain answer.  :/
Thanks, we'll keep trying to tease one out.


>>> This is also the framework in which *-*-linux-gnu-musl makes sense for a 
>>> system that uses Musl libc but is otherwise a GNU/Linux system.
>> 
>> Right but again where do we draw the line? For example, can one use systemd 
>> and its large entourage of intertwined software, or must one use GNU 
>> Shepherd or System V init?
>> 
> 
> In the case of *-*-linux-gnu and *-*-linux-gnu-musl, the difference is the C 
> runtime library (GNU libc vs. Musl libc) such that shared objects linked for 
> one ABI are not compatible with the other.  If Musl libc were exactly 100% 
> binary compatible with GNU libc, then there would be no *-*-linux-gnu-musl 
> platform, since it would be indistinguishable from *-*-linux-gnu.
Err I mean, is there am example of a *-*-linux-$nongnu-musl?

Agreed that if Musl was binary compatible with glibc, there would be no need to 
distinguish at the config level.

> The choice of system service management is orthogonal to this, since it has 
> minimal impact on user programs.  (Unless systemd gets even more outrageously 
> invasive...)
Agreed, just wanted to double check.

> Except configure usually does not need a "fully disambiguated" form---the 
> canonical form produced by config.sub is fine, since configure is usually 
> matching against the full tuple using shell case patterns.  The flat list 
> with a defined order is optimal for this strategy, since it allows to easily 
> check for the presence of any tag or combination of tags.
Shell case patterns can be a bit of a footgun. For example, a common mistake is 
doing * instead of *-*. I would rather case on disambiguated variables. Indeed, 
AC_CANONICAL_HOST computes host_cpu, host_vendor, and host_os for precisely 
that purpose. If config.sub could split out the disambiguated form, those 
variables could be defined more simply and robustly.

> Note that config.sub is itself a shell script, and handling JSON in shell is 
> a giant pain.  The most we could reasonably do is what config.sub already 
> does:  determine each component as a separate variable and then output that 
> by substituting text into a template.
>> Yes I agree config.sub in its current form (must be highly portable across 
>> different Bourne-shell derivatives) has no hope of parsing JSON. It could 
>> output it or it could also output your ${key}=${value}\n format, and it 
>> could also consume your format. Your format is ideal for it!
> Adding a prefix to each key in the key=value format is trivial and would 
> further help shell scripts that want to "parse by eval" but configure itself 
> tests predicates rather than caring exactly what part of the configuration 
> tuple means what.  Put another way, configure is usually looking for a yes/no 
> answer, so a pre-parsed form is less useful than a single string that can be 
> used for pattern matches.
I agree testing is more robust, but for better or worse I still do see scripts 
using those host_* variables mentioned above. (Testing is possible but requires 
more care to get right for cross-compilation, for one.)

> There is no reasonable way to feed the key=value format *into* config.sub: 
> configuration tuples are hyphen-delimited lists.
I think there is. The overall algorithm is roughly "(a) decide which component 
is which, (b) sanitize and normalize components decision to that decision". We 
would skip step (a) and go straight to step (b) in order to do this.

This indicates part of the value of doing this: rather than just "system 
testing" the entirety of config.sub, we would now have something closer to a 
"unit test" of part of it in isolation.

FWIW, this is similar to a rearranging the code to a support a mode where 
non-normal-form configs are rejected instead of normalized.

> Producing key=value format using config.sub's knowledge of valid tuples might 
> be reasonable for *other* systems to use instead of needing their own parsers.
Yes it is definitely necessary for that, and that is a good use-case for sure.

>>> Thank you; as I mentioned above, the goal is to best support heterogeneous 
>>> multi-arch systems, but recognizing a tension here.  For configure, the 
>>> configuration tuple should not contain information that can be determined 
>>> by testing, but for storing multiple binary sets, ABIs do need to be part 
>>> of the name, even if they can be determined by configure tests.
>> 
>> Agreed configure tests are better for the "long tail" of other attributes. 
>> (IMO if we were to define "operating system", it would be something like the 
>> "limit" of all configure checks.) 
>> 
>> But a big part of my "kernel-libc" thinking (and I think also Connor's) is 
>> that kernel + libc mostly determines on OS for toolchain purposes. E.g. even 
>> if there exists both GNU-like and non-GNU linux-musl systems, they can share 
>> the same prefixed binaries, so there is no need to distinguish them at the 
>> config level. 
>> 
>> Do you have a counter-example where a sameKernel-OS_0-sameLibc and a 
>> sameKernel-OS_1-sameLibc would need different prefixed binaries, and thus 
>> the inclusion of OS in addition to kernel and libc is necessary to avoid 
>> binary name collisions?
> 
> I am not entirely certain why, but I know that there is some reason we call 
> the common GNU/Linux systems *-*-linux-gnu instead of *-*-linux.
To be honest, I think this is basically the "call it GNU/Linux not Linux" 
controversy --- i.e. at the time it was done for social not technical reasons. 
I don't mind, since now that we have multiple libcs there *is* a technical 
reason to distinguish. But this circles back to my hunch that Kernel (syscall 
interface) + libc (ABI) determines OS uniquely enough for config.sub's purposes.

>>> I called the fifth field "LIBCABI" because it can be a libc name or an ABI 
>>> name; in practice the two are usually closely related.  Some existing 
>>> tuples place a libc name in that slot, while others use a more generic ABI 
>>> or file format name, such as "elf" in your example.  For it to be a source 
>>> of confusion, there would need to be a libc that supports multiple ABIs, 
>>> and you would simply use the ABI names in that case.
>> 
>> Perhaps you know of examples of existing ones out in the wild that I am not 
>> aware of that need to include kernel, OS, and libc? Do share if you do!
> 
> The major example that immediately comes to mind would be a GNU/Linux 
> distribution using Musl libc.  But that comes back to why *-*-linux-gnu 
> exists in the first place...
Erm I mean not an extant system that would use such a config under your system, 
but an extant config (not necessarily a GNU one, could be an LLVM, Rust, or 
something else one) for such a system. In other words, I am asking whether 
there was a case where someone else evidently decided that kernel+libc was not 
enough info and OS was also needed to further disambiguate.

John

Re: Rethinking configuration tuples

Reply via email to