Re: Rethinking configuration tuples

Jacob Bachmeyer Sat, 26 Aug 2023 22:07:14 -0700

John Ericson wrote:

On 8/24/23 23:54, Jacob Bachmeyer wrote:
John Ericson wrote:
This is why I opened with "Operating System" lacks a coherentobjective definition.
[...]
As I understand, historically, "operating systems" were proprietarymonoliths and the GNU Project originally expected to produce anothermonolith, but /our/ monolith would be Free Software. As an interimmeasure, the GNU utilities were designed to be widely portable acrossthe various individually-monolithic proprietary operating systemsthen in use across a wide variety of hardware. The broader FreeSoftware Movement unexpectedly shattered that state of affairs,leading to the 4-element configuration tuple form, when the Linuxkernel became available and it was noticed that---oops!---GNU onLinux and GNU on HURD would have significant differences that atleast some of the GNU packages would need to handle. (For example,GNU libc is very different between Linux, where POSIX I/O maps fairlydirectly to underlying syscalls, and HURD, where POSIX I/O must betranslated to Mach IPC, but both of these are Free GNU systems.)
This means that the GNU system is a somewhat blurry category, withmany variants possible, and is orthogonal to "Linux": there areGNU/Linux systems, GNU systems using other kernels, and Linux-basedsystems not using GNU at all. This latter category is fairly commonin embedded systems, where the GNU utilities are often eschewed forlighter-weight alternatives to save flash space (or, less honorably,to avoid GPL3).
Yes I agree with this state of affairs. I sometimes (but not always!)detect a sort of "Linux Scooped us" sentiment in GNU quarters, but asI see it portability and diversity of distros was pretty muchinevitable --- replacing propriety Unix userlands with GNU softwarewas a huge point in how GNU got going in academic/institutionalenvironments in the early days, and even if Hurd got there beforeLinux there would be no reason to rip out that portability.

As I understand the history, Linux was the first clearly Free kernelavailable. At the time, BSD still had a dark cloud hanging over it dueto its (distant) origins at AT&T; the BSD and AT&T UNIX codebases wouldnot be legally recognized as separate until February 1994, although BSDhad honestly (almost?) completely diverged from the AT&T codebase inJune 1991 with Net/2. Mach was still proprietary; RMS was (or wouldlater be) campaigning for its liberation, which would not occur untilsome years later. It is worth noting that Linux was originally a toykernel, and it only attracted the effort it did and grew like it didbecause it was basically the last missing piece for fully Free systemsat the time.

JSON is pretty much a hard no for me: it is far too complex for whatreally needs to be a simple structure. Flat strings work very wellfor the way that GNU software typically expects to parse aconfiguration tuple using shell constructs. Perhaps it would bebetter to redefine configuration tuples as a flat list of tags with acanonical ordering? (The reason for a canonical ordering is in partto ensure that all existing coherent configuration tuple stringsremain valid and to ensure that text-based pattern matching continuesto work.)
Ah sorry, I shouldn't have made reference to JSON at all --- what Ireally was getting at is the /abstract syntax/. In particular, ratherthan having an abstract syntax of "list of strings" (parsing today'sconcrete syntax by breaking on dash), where the meaning of each stringis ambiguous / context-sensative, we have of "keys mapped toenumerations", i.e. one always knows the meaning of each componentexplicitly / without inspecting it or its context.
JSON or your flat list in canonical ordering (where I assume we arecareful to never skip a type of component) are both valid concretesyntaxes that can be parsed / printed from this abstract syntax.

JSON is far too complicated to use here, except possibly as a"pre-parsed" form that config.sub could output on request for programsthat want a structured form instead of parsing the tuple themselves.But for that case, why use JSON instead of a trivial multi-linekey=value format?


Hypothetical Example:
$ config.sub --parse x86_64-linux-gnu
cpu=x86_64
vendor=pc
kernel=linux
os=gnu
$

Note that this example both canonicalizes and parses.

[...]
I know Po Lu doesn't like them, because they overlap with existingones. But what about you two, Adam and Jacob? I am trying tocompromise between what various things do already, and and alsocorrect things like windows-gnu (even if there is no such thing asthe GNU operating system (only multiple GNU Hurd-supportingdistros), I agree that MinGW is clearly not a complete enough of setof GNU software to earn the right to drop the "minimal" part).
The logical problem with your parenthetical is that it ignoresGNU/Linux, which *is* also a GNU system.
Hmm? I meant keep -gnu only for things which actually use GNU libc.Now I supposed something could use GNU libc but be really different inother ways from a real GNU system, but I am not really sure where todraw the line. There is a bury grey area of "use GNU libc but not sureif counts as GNU", no?

I argue for "duck-typing" here from the user's perspective: if and onlyif the system in all meaningful ways appears to be the GNU system, thereshould be a *-gnu* somewhere in the configuration tuple. This is themajor expectation that using *-*-windows-gnu for MinGW violates: GNUimplements POSIX and MinGW does not. Using *-mingnu still leavesconsiderable room for confusion in my view, which using *-mingw avoids.This is also the framework in which *-*-linux-gnu-musl makes sense for asystem that uses Musl libc but is otherwise a GNU/Linux system.

Effectively, a different libc is a different ABI. My larger goal hereis to smooth the way for multi-arch systems, with/usr/CPU-VENDOR-KERNEL-OS-ABI or so as the --prefix for binaries builtfor each architecture. This means that configuration tuples should bedetailed enough to allow the needed distinctions, but not so detailed asto themselves become an artificial incompatibility. In larger networkedenvironments, even KERNEL and OS could vary.

I also quibble with CPU-VENDOR-linux-gnu and CPU-VENDOR-linux-musl.Android and GNU are different operating systems that both (can) usethe Linux kernel, so I agree with CPU-VENDOR-linux-android forAndroid. The other two I see as: *-*-linux-gnu --- the GNU/Linuxsystem, using GNU libc unless otherwise specified; *-*-linux-musl ---some unspecified Linux-based system using Musl libc, not necessarilyusing GNU.
With the proposed five-element form, the ambiguity is resolved:*-*-linux-gnu-musl --- a variant GNU/Linux system, using Musl libc.
Similar to the above, I know when something is/isn't using a specificlibc, but any other distinction seems very blurry to me. See also whatConnor wrote (perhaps more diplomatically than my "operating systemsare inherently subjective!" bombast :))

Again, "duck typing"---if the system appears to be the GNU system, thetuple should contain *-gnu* somewhere.

If we can accept these, I think I will have no problem getting LLVMto accept windows-mingnu, and perhaps even warn/deprecate windows-gnu.
I still say this should be windows-mingw, but yes "windows-gnu"should definitely be deprecated, removed, and reserved in casesomeone actually ports a POSIX GNU environment to Windows.
Yeah whatever windows-something we settle on for MinGW, I promise myoffer still stands to try to get get LLVM to (a) accept it, and (b)steer people away from windows-gnu towards it.


Thanks.

After that, I think we are close enough to convene a working groupfor a JSON/whatever explicit standard. And that would be amazing.
I still oppose JSON because it is way too verbose for this:configuration tuples need to be both expressive and simple enough totype at a shell prompt as arguments to configure. Using JSON bydefault would also be a very nasty "flag day" that would break allexisting programs that use config.sub. Perhaps config.sub couldaccept an --as=json parameter for JSON output?
Yes exactly, JSON is a no-go for prefixed binaries, but probablybetter for things like Autoconf which needs to parse the output ofconfig.sub either way.

No, because Autoconf uses the shell and JSON is a [*profanity elided*]to parse using shell constructs. A flat list of hyphen-delimited tagsis almost ideal for the parsing that configure needs to do. In fact,with a few restrictions (met by using canonical ordering) this is whatconfigure /already/ parses.

I am even OK if the dash-separating is a sort of "legacy mode" thingthat remains ambiguous so long as one can always convert theunambiguous form (e.g. JSON) to it with much less logic thanconfig.sub. (e.g. do all the work in "normalize to JSON", and making"JSON to old format" a very simple follow-up step.)

Note that config.sub is itself a shell script, and handling JSON inshell is a giant pain. The most we could reasonably do is whatconfig.sub already does: determine each component as a separatevariable and then output that by substituting text into a template.

The hyphen-separated form is unambiguous as it stands, or close enoughto be resolvable with minimal effort. With a dictionary of allowedelement values, it is unambiguous, even if some elements are omitted;resolving ambiguous forms to unambiguous forms using such a dictionaryis what config.sub /does/.

An alternate proposal hinted at above is to redefine configuationtuples as a flat tag list with canonical ordering. For example, aCPU type always comes first, but the rest is just a set of tagsfurther describing the system, generally working from wide categories(like CPU architecture) to narrow categories (like choice of libc).A larger single installation could easily have some variety in thenarrower categories; a network cluster running a single system image(which I understand is an eventual goal for HURD) could even have avariety of CPU types.
Yeah I think the "increasingly narrowing" way of thinking about it(almost like a converging sequence from Calculus) is very good.

Thank you; as I mentioned above, the goal is to best supportheterogeneous multi-arch systems, but recognizing a tension here. Forconfigure, the configuration tuple should not contain information thatcan be determined by testing, but for storing multiple binary sets, ABIsdo need to be part of the name, even if they can be determined byconfigure tests.

But to Adam's point, I think it is good that we recognize that whilethere are 5 tuples today (e.g. in the LLVM test suite), I don't thinkany of them do OS-LIBC; instead I things likeaarch64-unknown-windows-gnu-elf. Not saying OS-LIBC is inherently bad(though I do have some reservations like Connor's), just that OS-LIBCis novel.

I called the fifth field "LIBCABI" because it can be a libc name or anABI name; in practice the two are usually closely related. Someexisting tuples place a libc name in that slot, while others use a moregeneric ABI or file format name, such as "elf" in your example. For itto be a source of confusion, there would need to be a libc that supportsmultiple ABIs, and you would simply use the ABI names in that case.



-- Jacob

Re: Rethinking configuration tuples

Reply via email to