John Ericson wrote:
This is why I opened with "Operating System" lacks a coherent
objective definition.
The more pugilistic message is to say the rest of the world doesn't
think the GNU operating system exists --- that there is simply a
choice of kernel (Linux, k*BSD, Hurd, something else...) and choices
of libraries and system components on top of that, and many
combinations are possible. The rest of the world might say this in a
mean way, but I say it is actually a /good/ thing --- software freedom
means one /can/ choose my components à la carte, and only a lack of
software freedom results in a kernel and mass of libraries outside
one's control blurring together into a scary "take it or leave it"
monolith we call an operating system.
As I understand, historically, "operating systems" were proprietary
monoliths and the GNU Project originally expected to produce another
monolith, but /our/ monolith would be Free Software. As an interim
measure, the GNU utilities were designed to be widely portable across
the various individually-monolithic proprietary operating systems then
in use across a wide variety of hardware. The broader Free Software
Movement unexpectedly shattered that state of affairs, leading to the
4-element configuration tuple form, when the Linux kernel became
available and it was noticed that---oops!---GNU on Linux and GNU on HURD
would have significant differences that at least some of the GNU
packages would need to handle. (For example, GNU libc is very different
between Linux, where POSIX I/O maps fairly directly to underlying
syscalls, and HURD, where POSIX I/O must be translated to Mach IPC, but
both of these are Free GNU systems.)
This means that the GNU system is a somewhat blurry category, with many
variants possible, and is orthogonal to "Linux": there are GNU/Linux
systems, GNU systems using other kernels, and Linux-based systems not
using GNU at all. This latter category is fairly common in embedded
systems, where the GNU utilities are often eschewed for lighter-weight
alternatives to save flash space (or, less honorably, to avoid GPL3).
On 8/24/23 08:51, Adam Joseph wrote:
[...]
It seems like a lot of the proposals in this thread are being evaluated not
based on whether or not they are coherent, but rather on whether or not they
take us a few nanometers closer to whatever happens to whatever LLVM's internal
implementation details happen to be this week.
I care about coherence, the reason I like to see what LLVM does that
working from a parsed representation forces the software to be much
more honest. Since GNU config doesn't reveal its categories but just
spits out another opaque string, there is no external pressure for its
categorization to be any good. LLVM, on the other hand, dispenses with
strings entirely and just uses the enums, so it is forced to make sure
those enums make sense and work for the branching the program has to do.
LLVM parsing of configs is ad-hoc Postel's law stuff like everyone
else, but its internal representation is actually quite stable.
Parsing is the ugly nasty part that gets to the pristine clear
ontology on the other side.
Ultimately I would like to convene everyone to commit to an agreed
upon internal representation too. E.g. clang and GNU config could both
spit out some JSON that is unambiguous and should match. I think that
would alleviate a lot of Adam's concerns about "following LLVM". But I
don't think it is possible to convene the working group needed to
standardize such a format yet, because there is little trust between
parties. Moving us a "a few nanometers closer" on each side
demonstrates that there is willingness to compromise.
JSON is pretty much a hard no for me: it is far too complex for what
really needs to be a simple structure. Flat strings work very well for
the way that GNU software typically expects to parse a configuration
tuple using shell constructs. Perhaps it would be better to redefine
configuration tuples as a flat list of tags with a canonical ordering?
(The reason for a canonical ordering is in part to ensure that all
existing coherent configuration tuple strings remain valid and to ensure
that text-based pattern matching continues to work.)
-----------
Concretely, I think these are pretty clear configs:
CPU-VENDOR-windows-mingnu # MinGW, MS C + GNU C++ and other GNU-ish
things, TODO distinguish between MSVCRT and UCRT
I say that this one really should just be *-mingw. Note that there are
both MinGW32 and MinGW64, corresponding to 32-bit and 64-bit Windows
APIs. Should that be included or should the CPU type be used to
distinguish? (e.g. i686-pc-windows-mingw is MinGW32 and
x86_64-pc-windows-mingw is MinGW64?)
In the proposed five-element form, MSVCRT and UCRT are easily
distinguished. Example:
i686-pc-windows-mingw-msvcrt
i686-pc-windows-mingw-ucrt
x86_64-pc-windows-mingw-msvcrt
x86_64-pc-windows-mingw-ucrt
CPU-VENDOR-windows-cygnus # Cygwin
CPU-VENDOR-windows-msys # MSYS2, a lot like Cygwin
CPU-VENDOR-windows-msvc # MS C + MS C++
CPU-VENDOR-linux-gnu # gnu libc
CPU-VENDOR-linux-musl # musl libc
CPU-VENDOR-linux-android # bionic libc
I know Po Lu doesn't like them, because they overlap with existing
ones. But what about you two, Adam and Jacob? I am trying to
compromise between what various things do already, and and also
correct things like windows-gnu (even if there is no such thing as the
GNU operating system (only multiple GNU Hurd-supporting distros), I
agree that MinGW is clearly not a complete enough of set of GNU
software to earn the right to drop the "minimal" part).
The logical problem with your parenthetical is that it ignores
GNU/Linux, which *is* also a GNU system.
I also quibble with CPU-VENDOR-linux-gnu and CPU-VENDOR-linux-musl.
Android and GNU are different operating systems that both (can) use the
Linux kernel, so I agree with CPU-VENDOR-linux-android for Android. The
other two I see as: *-*-linux-gnu --- the GNU/Linux system, using GNU
libc unless otherwise specified; *-*-linux-musl --- some unspecified
Linux-based system using Musl libc, not necessarily using GNU.
With the proposed five-element form, the ambiguity is resolved:
*-*-linux-gnu-musl --- a variant GNU/Linux system, using Musl libc.
If we can accept these, I think I will have no problem getting LLVM to
accept windows-mingnu, and perhaps even warn/deprecate windows-gnu.
I still say this should be windows-mingw, but yes "windows-gnu" should
definitely be deprecated, removed, and reserved in case someone actually
ports a POSIX GNU environment to Windows.
After that, I think we are close enough to convene a working group for
a JSON/whatever explicit standard. And that would be amazing.
I still oppose JSON because it is way too verbose for this:
configuration tuples need to be both expressive and simple enough to
type at a shell prompt as arguments to configure. Using JSON by default
would also be a very nasty "flag day" that would break all existing
programs that use config.sub. Perhaps config.sub could accept an
--as=json parameter for JSON output?
An alternate proposal hinted at above is to redefine configuation tuples
as a flat tag list with canonical ordering. For example, a CPU type
always comes first, but the rest is just a set of tags further
describing the system, generally working from wide categories (like CPU
architecture) to narrow categories (like choice of libc). A larger
single installation could easily have some variety in the narrower
categories; a network cluster running a single system image (which I
understand is an eventual goal for HURD) could even have a variety of
CPU types.
-- Jacob