On 2026-03-14 05:04, Bruno Haible wrote:

here, clearly, everyone
is assuming that
    uint8_t == 'unsigned char' has 8 bits,
    uint16_t == 'unsigned short' has 16 bits,
    uint32_t == 'unsigned int' has 32 bits,
    uint64_t == 'unsigned long long' has 64 bits,
    and two's complement.

That depends on what is meant by "here". Yes, in parts of Gnulib we assume that. However, stdbit.in.h doesn't assume that anywhere, not even in the byteswap.in.h that it sometimes includes. The only place those types appear is in a stdbit.in.h comment. Come to think of it, that comment's types should be changed to match the code; done in the first attached patch.

To some extent the "least" types are an intellectual exercise. It reminds me of Knuth's MIX machine, where the machine words are either base 64 or base 100 and your programs are supposed to work either way; although this made some sense in the early 1960s, base-10 hardware died out long ago.

The main thing different here is that the Burroughs B5000 etc. architectures are still commercially viable, even if they are not likely Gnulib targets. Still, it can be fun to make the code portable so long as that doesn't significantly hurt performance (or later maintenance...), which I hope is the case here.


Yes, this is my confusion: If, say, uint16_t does not exist at all, how
is uint_least16_t represented?

On such platforms it's an unsigned integer type containing at least 17 value bits. There may be padding bits. It has the smallest number of value bits among all such unsigned integer types.

For example, on the B5000 architecture I mentioned, uint_least16_t would be equivalent to unsigned int, which has 39 value bits, as that machine supports unsigned arithmetic as the low-order 39 bits of a 48-bit machine word. The high-order 9 bits are padding, for that type. (I see now that the Gnulib documentation is off by 1 in this area; fixed in the 2nd attached patch.)


> And do we have to write>    x & 0xFFFFU
instead of
   (uint16_t) x
then?

Yes, if X might exceed 2**16. That doesn't happen in the functions we're discussing, though, so the & 0xFFFF is not needed.


And similarly, will a conversion (implicit or cast) from
uint_least16_t to int_least16_t extend the sign bit?

It is like converting unsigned int to int: if the value in question exceeds INT_LEAST16_MAX, in C99 and later the result is implementation-defined or an implementation-defined signal is raised. (C89 did not allow for the signal.)

I imagine that the option for the signal was put into C99 to allow for debugging implementations; however, I don't know of any platform, even debugging platforms, that do that. Instead, as far as I know all platforms copy the word value unmodified, and the integer's sign bit is taken from !!(value & 0x8000U) though the C standard does not require this. Come to think of it, Gnulib code assumes the usual behavior, and this should be documented; done by installing the 3rd attached patch.

This issue cannot occur on the B5000 architecture, as INT_LEAST16_MAX == UINT_LEAST16_MAX there. It can occur on the UNIVAC 1103 architecture, which uses ones' complement. However, the latter architecture has CHAR_BIT == 9, so in some sense it wouldn't need to worry about this issue, as C2y requires these new functions only when CHAR_BIT == 8.
From c50476417d6a9908d30cad69c5ad90339eacd047 Mon Sep 17 00:00:00 2001
From: Paul Eggert <[email protected]>
Date: Sat, 14 Mar 2026 11:30:35 -0700
Subject: [PATCH 1/3] stdbit-h: adjust comments to match code
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* lib/stdbit.in.h: Adjust commentary to match code’s types
and use of if rather than #if.
---
 lib/stdbit.in.h | 46 ++++++++++++++++++++--------------------------
 1 file changed, 20 insertions(+), 26 deletions(-)

diff --git a/lib/stdbit.in.h b/lib/stdbit.in.h
index 893e1cce71..8e24060f3a 100644
--- a/lib/stdbit.in.h
+++ b/lib/stdbit.in.h
@@ -1124,76 +1124,70 @@ stdc_bit_ceil_ull (unsigned long long int n)
    option '-fno-strict-aliasing' is no viable solution.
    So, this definition won't work:
 
-     uint16_t
+     uint_least16_t
      load16 (const unsigned char ptr[2])
      {
-       return *(const uint16_t *)ptr;
+       return *(const uint_least16_t *)ptr;
      }
 
    Instead, the following definitions are candidates:
 
      // Trick from Lasse Collin: use memcpy and __builtin_assume_aligned.
-     uint16_t
+     uint_least16_t
      load16_a (const unsigned char ptr[2])
      {
-       uint16_t value;
+       uint_least16_t value;
        memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
        return value;
      }
 
      // Use __builtin_assume_aligned, without memcpy.
-     uint16_t
+     uint_least16_t
      load16_b (const unsigned char ptr[2])
      {
        const unsigned char *aptr =
          (const unsigned char *) __builtin_assume_aligned (ptr, 2);
-       #if WORDS_BIGENDIAN
-       return ((uint16_t) aptr [0] << 8) | (uint16_t) aptr [1];
-       #else
-       return (uint16_t) aptr [0] | ((uint16_t) aptr [1] << 8);
-       #endif
+       return (_GL_STDBIT_BIGENDIAN
+               ? ((uint_least16_t) aptr [0] << 8) | (uint_least16_t) aptr [1]
+               : (uint_least16_t) aptr [0] | ((uint_least16_t) aptr [1] << 8));
      }
 
      // Use memcpy and __assume.
-     uint16_t
+     uint_least16_t
      load16_c (const unsigned char ptr[2])
      {
        __assume (((uintptr_t) ptr & (2 - 1)) == 0);
-       uint16_t value;
+       uint_least16_t value;
        memcpy (&value, __builtin_assume_aligned (ptr, 2), 2);
        return value;
      }
 
      // Use __assume, without memcpy.
-     uint16_t
+     uint_least16_t
      load16_d (const unsigned char ptr[2])
      {
        __assume (((uintptr_t) ptr & (2 - 1)) == 0);
-       #if WORDS_BIGENDIAN
-       return ((uint16_t) ptr [0] << 8) | (uint16_t) ptr [1];
-       #else
-       return (uint16_t) ptr [0] | ((uint16_t) ptr [1] << 8);
-       #endif
+       return (_GL_STDBIT_BIGENDIAN
+               ? ((uint_least16_t) ptr [0] << 8) | (uint_least16_t) ptr [1]
+               : (uint_least16_t) ptr [0] | ((uint_least16_t) ptr [1] << 8));
      }
 
      // Use memcpy, without __builtin_assume_aligned or __assume.
-     uint16_t
+     uint_least16_t
      load16_e (const unsigned char ptr[2])
      {
-       uint16_t value;
+       uint_least16_t value;
        memcpy (&value, ptr, 2);
        return value;
      }
 
      // Use the code for the unaligned case.
-     uint16_t
+     uint_least16_t
      load16_f (const unsigned char ptr[2])
      {
-       #if WORDS_BIGENDIAN
-       return ((uint16_t) ptr [0] << 8) | (uint16_t) ptr [1];
-       #else
-       return (uint16_t) ptr [0] | ((uint16_t) ptr [1] << 8);
-       #endif
+       return (_GL_STDBIT_BIGENDIAN
+               ? ((uint_least16_t) ptr [0] << 8) | (uint_least16_t) ptr [1]
+               : (uint_least16_t) ptr [0] | ((uint_least16_t) ptr [1] << 8));
      }
 
    Portability constraints:
-- 
2.51.0

From ad2b13fefd5e328921da1b1356f05e9d82ea7841 Mon Sep 17 00:00:00 2001
From: Paul Eggert <[email protected]>
Date: Sat, 14 Mar 2026 12:21:55 -0700
Subject: [PATCH 2/3] doc: fix typos re Unisys ClearPath Libra

---
 doc/gnulib-intro.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/gnulib-intro.texi b/doc/gnulib-intro.texi
index e8d36102b8..b7375f2059 100644
--- a/doc/gnulib-intro.texi
+++ b/doc/gnulib-intro.texi
@@ -344,8 +344,8 @@ This platform's architecture descends from the UNIVAC 1103 (1953).
 @item
 The Unisys ClearPath Libra's machine word is 48 bits
 with a 4-bit tag and a 4-bit data extension.  Its
-@code{unsigned int} uses the low-order 40 bits of the word, and
-@code{int} uses the low-order 41 bits of the word with a
+@code{unsigned int} uses the low-order 39 bits of the word, and
+@code{int} uses the low-order 40 bits of the word with a
 signed-magnitude representation that conforms to C17 and earlier
 but not to C23 and later.  On these machines, @code{INT_MAX ==
 UINT_MAX}, @code{INT_MIN == -INT_MAX}, and @code{sizeof (int) == 6}.
-- 
2.51.0

From c28c982fc6990d41f481b9f79ce67de651407b8a Mon Sep 17 00:00:00 2001
From: Paul Eggert <[email protected]>
Date: Sat, 14 Mar 2026 12:23:13 -0700
Subject: [PATCH 3/3] doc: assume copy to int preserves low order bits

---
 doc/gnulib-readme.texi | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/doc/gnulib-readme.texi b/doc/gnulib-readme.texi
index 1da97bc5ba..1f94fd03e5 100644
--- a/doc/gnulib-readme.texi
+++ b/doc/gnulib-readme.texi
@@ -488,18 +488,28 @@ and the GNU coding standards both require this.
 @item
 Signed integer arithmetic is two's complement.
 
-Previously, Gnulib code sometimes also assumed that signed integer
-arithmetic wraps around, but modern compiler optimizations
-sometimes do not guarantee this, and Gnulib code with this
-assumption is now considered to be questionable.
-@xref{Integer Properties}.
-
 Although some Gnulib modules contain explicit support for
 ones' complement and signed magnitude integer representations,
 which are allowed by C17 and earlier,
 these modules are the exception rather than the rule.
 All practical Gnulib targets use two's complement, which is required by C23.
 
+@item
+When an out-of-range integer value is copied to a signed integer,
+low-order bits are copied and high-order bits are silently discarded.
+
+Although the C standard says the that the resulting signed value is
+implementation-defined or an implementation-defined signal is raised,
+all known platforms simply copy the low-order bits,
+just as the C standard requires when copying to an unsigned integer.
+
+This assumption is merely about copying data, not about arithmetic.
+Previously, Gnulib code sometimes also assumed that signed integer
+arithmetic wraps around, but modern compiler optimizations
+sometimes do not guarantee this, and Gnulib code with this
+assumption is now considered to be questionable.
+@xref{Integer Properties}.
+
 @item
 There are no ``holes'' in integer values: all the bits of an integer
 contribute to its value in the usual way.
-- 
2.51.0

Reply via email to