On 8/24/23 23:54, Jacob Bachmeyer wrote:
John Ericson wrote:
This is why I opened with "Operating System" lacks a coherent
objective definition.
[...]
As I understand, historically, "operating systems" were proprietary
monoliths and the GNU Project originally expected to produce another
monolith, but /our/ monolith would be Free Software. As an interim
measure, the GNU utilities were designed to be widely portable across
the various individually-monolithic proprietary operating systems then
in use across a wide variety of hardware. The broader Free Software
Movement unexpectedly shattered that state of affairs, leading to the
4-element configuration tuple form, when the Linux kernel became
available and it was noticed that---oops!---GNU on Linux and GNU on
HURD would have significant differences that at least some of the GNU
packages would need to handle. (For example, GNU libc is very
different between Linux, where POSIX I/O maps fairly directly to
underlying syscalls, and HURD, where POSIX I/O must be translated to
Mach IPC, but both of these are Free GNU systems.)
This means that the GNU system is a somewhat blurry category, with
many variants possible, and is orthogonal to "Linux": there are
GNU/Linux systems, GNU systems using other kernels, and Linux-based
systems not using GNU at all. This latter category is fairly common
in embedded systems, where the GNU utilities are often eschewed for
lighter-weight alternatives to save flash space (or, less honorably,
to avoid GPL3).
Yes I agree with this state of affairs. I sometimes (but not always!)
detect a sort of "Linux Scooped us" sentiment in GNU quarters, but as I
see it portability and diversity of distros was pretty much inevitable
--- replacing propriety Unix userlands with GNU software was a huge
point in how GNU got going in academic/institutional environments in the
early days, and even if Hurd got there before Linux there would be no
reason to rip out that portability.
JSON is pretty much a hard no for me: it is far too complex for what
really needs to be a simple structure. Flat strings work very well
for the way that GNU software typically expects to parse a
configuration tuple using shell constructs. Perhaps it would be
better to redefine configuration tuples as a flat list of tags with a
canonical ordering? (The reason for a canonical ordering is in part
to ensure that all existing coherent configuration tuple strings
remain valid and to ensure that text-based pattern matching continues
to work.)
Ah sorry, I shouldn't have made reference to JSON at all --- what I
really was getting at is the /abstract syntax/. In particular, rather
than having an abstract syntax of "list of strings" (parsing today's
concrete syntax by breaking on dash), where the meaning of each string
is ambiguous / context-sensative, we have of "keys mapped to
enumerations", i.e. one always knows the meaning of each component
explicitly / without inspecting it or its context.
JSON or your flat list in canonical ordering (where I assume we are
careful to never skip a type of component) are both valid concrete
syntaxes that can be parsed / printed from this abstract syntax.
-----------
Concretely, I think these are pretty clear configs:
CPU-VENDOR-windows-mingnu # MinGW, MS C + GNU C++ and other GNU-ish
things, TODO distinguish between MSVCRT and UCRT
I say that this one really should just be *-mingw.
Sure. I went with mingnu because the "w" is redundant with the
"windows", but ultimately I care more about the pattern than the exact
choice of identifiers / enumeration tags. (As we way in programming
language land, I care about the thing "up to alpha-renaming").
Note that there are both MinGW32 and MinGW64, corresponding to 32-bit
and 64-bit Windows APIs. Should that be included or should the CPU
type be used to distinguish? (e.g. i686-pc-windows-mingw is MinGW32
and x86_64-pc-windows-mingw is MinGW64?)
Yes I think so. If you look at https://www.mingw-w64.org/downloads/ one
even sees |x86_64-w64-mingw32| which is quite something, and 64-bit!
I think what happened is that "w32" to was chosen to mean the then-new
win32 API/ABI, as opposed to DOS. Win64 as I understand is necessarily a
new ABI because of the change in CPU arch, but not really a new API,
being more of a "let's make the minimal amount of changes so the
source/headers are portable" situation. So a combination of "same API"
and "too lazy to update GNU config" made "mingw32" stick around.
f16804b79ee5a23a9994a1cdc760cd9ba813148a added mingw64 to GNU config in
2012, which is far after the advent of 64-bit Windows.
In the proposed five-element form, MSVCRT and UCRT are easily
distinguished. Example:
i686-pc-windows-mingw-msvcrt
i686-pc-windows-mingw-ucrt
x86_64-pc-windows-mingw-msvcrt
x86_64-pc-windows-mingw-ucrt
That is very true, I will grant you that :)
CPU-VENDOR-windows-cygnus # Cygwin
CPU-VENDOR-windows-msys # MSYS2, a lot like Cygwin
CPU-VENDOR-windows-msvc # MS C + MS C++
CPU-VENDOR-linux-gnu # gnu libc
CPU-VENDOR-linux-musl # musl libc
CPU-VENDOR-linux-android # bionic libc
I know Po Lu doesn't like them, because they overlap with existing
ones. But what about you two, Adam and Jacob? I am trying to
compromise between what various things do already, and and also
correct things like windows-gnu (even if there is no such thing as
the GNU operating system (only multiple GNU Hurd-supporting distros),
I agree that MinGW is clearly not a complete enough of set of GNU
software to earn the right to drop the "minimal" part).
The logical problem with your parenthetical is that it ignores
GNU/Linux, which *is* also a GNU system.
Hmm? I meant keep -gnu only for things which actually use GNU libc. Now
I supposed something could use GNU libc but be really different in other
ways from a real GNU system, but I am not really sure where to draw the
line. There is a bury grey area of "use GNU libc but not sure if counts
as GNU", no?
I also quibble with CPU-VENDOR-linux-gnu and CPU-VENDOR-linux-musl.
Android and GNU are different operating systems that both (can) use
the Linux kernel, so I agree with CPU-VENDOR-linux-android for
Android. The other two I see as: *-*-linux-gnu --- the GNU/Linux
system, using GNU libc unless otherwise specified; *-*-linux-musl ---
some unspecified Linux-based system using Musl libc, not necessarily
using GNU.
With the proposed five-element form, the ambiguity is resolved:
*-*-linux-gnu-musl --- a variant GNU/Linux system, using Musl libc.
Similar to the above, I know when something is/isn't using a specific
libc, but any other distinction seems very blurry to me. See also what
Connor wrote (perhaps more diplomatically than my "operating systems are
inherently subjective!" bombast :))
If we can accept these, I think I will have no problem getting LLVM
to accept windows-mingnu, and perhaps even warn/deprecate windows-gnu.
I still say this should be windows-mingw, but yes "windows-gnu" should
definitely be deprecated, removed, and reserved in case someone
actually ports a POSIX GNU environment to Windows.
Yeah whatever windows-something we settle on for MinGW, I promise my
offer still stands to try to get get LLVM to (a) accept it, and (b)
steer people away from windows-gnu towards it.
After that, I think we are close enough to convene a working group for
a JSON/whatever explicit standard. And that would be amazing.
I still oppose JSON because it is way too verbose for this:
configuration tuples need to be both expressive and simple enough to
type at a shell prompt as arguments to configure. Using JSON by
default would also be a very nasty "flag day" that would break all
existing programs that use config.sub. Perhaps config.sub could
accept an --as=json parameter for JSON output?
Yes exactly, JSON is a no-go for prefixed binaries, but probably better
for things like Autoconf which needs to parse the output of config.sub
either way.
I am even OK if the dash-separating is a sort of "legacy mode" thing
that remains ambiguous so long as one can always convert the unambiguous
form (e.g. JSON) to it with much less logic than config.sub. (e.g. do
all the work in "normalize to JSON", and making "JSON to old format" a
very simple follow-up step.)
An alternate proposal hinted at above is to redefine configuation
tuples as a flat tag list with canonical ordering. For example, a CPU
type always comes first, but the rest is just a set of tags further
describing the system, generally working from wide categories (like
CPU architecture) to narrow categories (like choice of libc). A
larger single installation could easily have some variety in the
narrower categories; a network cluster running a single system image
(which I understand is an eventual goal for HURD) could even have a
variety of CPU types.
Yeah I think the "increasingly narrowing" way of thinking about it
(almost like a converging sequence from Calculus) is very good.
But to Adam's point, I think it is good that we recognize that while
there are 5 tuples today (e.g. in the LLVM test suite), I don't think
any of them do OS-LIBC; instead I things like
aarch64-unknown-windows-gnu-elf. Not saying OS-LIBC is inherently bad
(though I do have some reservations like Connor's), just that OS-LIBC is
novel.
John