https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123635

            Bug ID: 123635
           Summary: _BitInt bitint_extended vs. abi_limb_prec > limb_prec
                    or even just bitint_extended
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

In GCC 16 some targets (s390x, loongarch, riscv and posted but not committed
yet) decided to use info->extended = true; , something that hasn't been
supported before.

s390x from these thankfully uses limb_prec == abi_limb_prec, loongarch uses the
weird
extend just within the same limb and keep sometimes the most significant
limb_prec bits undefined, while riscv and arm chose to go with abi_limb_prec >
limb_prec.

Now, the riscv/arm way is definitely not supported correctly right now by the
lowering code and riscv doesn't even adjust bitintext.h so that some testing of
the extensions is done.

--- gcc/gimple-lower-bitint.cc.jj       2026-01-08 23:03:09.034192079 +0100
+++ gcc/gimple-lower-bitint.cc  2026-01-16 18:23:21.899066641 +0100
@@ -599,7 +599,11 @@ bitint_large_huge::limb_access_type (tre
     return m_limb_type;
   unsigned HOST_WIDE_INT i = tree_to_uhwi (idx);
   unsigned int prec = TYPE_PRECISION (type);
-  gcc_assert (i * limb_prec < prec);
+  gcc_assert (i * limb_prec < prec
+             || (bitint_extended
+                 && abi_limb_prec > limb_prec
+                 && i * limb_prec
+                    < CEIL (prec, abi_limb_prec) * abi_limb_prec));
   if (bitint_big_endian
       ? (i != 0 || (prec % limb_prec) == 0)
       : (i + 1) * limb_prec <= prec)
@@ -2866,6 +2870,16 @@ bitint_large_huge::lower_mergeable_stmt
     = (prec != (unsigned) TYPE_PRECISION (type)
        && (CEIL ((unsigned) TYPE_PRECISION (type), limb_prec)
           > CEIL (prec, limb_prec)));
+  if (bitint_extended
+      && !eq_p
+      && abi_limb_prec > limb_prec
+      && ((CEIL ((unsigned) TYPE_PRECISION (type), abi_limb_prec)
+          * abi_limb_prec / limb_prec) > CEIL (prec, limb_prec)))
+    {
+      if (prec == (unsigned) TYPE_PRECISION (type))
+       sext = !TYPE_UNSIGNED (type);
+      separate_ext = true;
+    }
   unsigned dst_idx_off = 0;
   if (separate_ext && bitint_big_endian)
     dst_idx_off = (CEIL ((unsigned) TYPE_PRECISION (type), limb_prec)
@@ -3104,6 +3118,8 @@ bitint_large_huge::lower_mergeable_stmt
       unsigned start = CEIL (prec, limb_prec);
       prec = TYPE_PRECISION (type);
       unsigned total = CEIL (prec, limb_prec);
+      if (bitint_extended && abi_limb_prec > limb_prec)
+       total = CEIL (prec, abi_limb_prec) * abi_limb_prec / limb_prec;
       idx = idx_first = idx_next = NULL_TREE;
       if (prec <= (start + 2 + (bo_shift != 0)) * limb_prec)
        kind = bitint_prec_large;
--- gcc/testsuite/gcc.dg/bitintext.h.jj 2025-08-06 12:02:36.825137182 +0200
+++ gcc/testsuite/gcc.dg/bitintext.h    2026-01-16 18:25:40.054708879 +0100
@@ -16,7 +16,7 @@ do_copy (void *p, const void *q, __SIZE_
 #define CEIL(x,y) (((x) + (y) - 1) / (y))

 /* Promote a _BitInt type to include its padding bits.  */
-#if defined (__s390x__) || defined(__arm__)
+#if defined (__s390x__) || defined(__arm__) || defined(__riscv__)
 #define PROMOTED_SIZE(x) sizeof (x)
 #elif defined(__loongarch__)
 #define PROMOTED_SIZE(x) (sizeof (x) > 8 ? CEIL (S (x), 64) * 8 : sizeof (x))
@@ -24,7 +24,8 @@ do_copy (void *p, const void *q, __SIZE_

 /* Macro to test whether (on targets where psABI requires it) _BitInt
    with padding bits have those filled with sign or zero extension.  */
-#if defined(__s390x__) || defined(__arm__) || defined(__loongarch__)
+#if defined(__s390x__) || defined(__arm__) || defined(__loongarch__) \
+    || defined(__riscv__)
 #define BEXTC1(x, uns) \
   do {                                                   \
     uns _BitInt(PROMOTED_SIZE (x) * __CHAR_BIT__) __x;   \

is my completely untested patch to at least make lower_mergeable_stmt extend in
the case of _BitInt(N * 128 + 1) to _BitInt(N * 128 + 64) where N >= 1.
It will result in worse code generation on loongarch though which chose not to
do that, so guess we need some new bool in struct bitint_info to differentiate
between the loongarch vs. riscv/arm extension in case of abi_limb_prec >
limb_prec.

Anyway, without this patch on say:
_BitInt(129) x, y;

void
foo ()
{
  x += y;
}

void
bar (int z)
{
  x <<= z;
}

void
baz (int z)
{
  x >>= z;
}

void
qux (long double z)
{
  x = z;
}

void
corge ()
{
  x *= y;
}
one can see on riscv that it never modifies the most significant 64 bits, it
only stores MEM[&var + 16B] and never MEM[&var + 20B].
With the patch the difference in bitintlower1 is:
@@ -39,6 +39,9 @@ void foo ()
   <unnamed-signed:1> _31;
   <unnamed-signed:1> _32;
   unsigned int _33;
+  signed int _34;
+  signed int _35;
+  unsigned int _36;

   <bb 2> [local count: 1073741824]:

@@ -80,6 +83,10 @@ void foo ()
   _32 = _31 + _30;
   _33 = (unsigned int) _32;
   MEM <unsigned int> [(_BitInt(129) *)&x + 16B] = _33;
+  _34 = (signed int) _33;
+  _35 = _34 >> 31;
+  _36 = (unsigned int) _35;
+  MEM <unsigned int> [(_BitInt(129) *)&x + 20B] = _36;
   return;

 }
@@ -382,6 +389,9 @@ void corge ()
   unsigned int _10;
   <unnamed-signed:1> _11;
   unsigned int _12;
+  signed int _13;
+  signed int _14;
+  unsigned int _15;

   <bb 2> [local count: 1073741824]:

@@ -402,6 +412,10 @@ void corge ()
   _11 = MEM <<unnamed-signed:1>> [(_BitInt(129) *)&x + 16B];
   _12 = (unsigned int) _11;
   MEM[(unsigned int *)&bitint.50 + 16B] = _12;
+  _13 = (signed int) _12;
+  _14 = _13 >> 31;
+  _15 = (unsigned int) _14;
+  MEM[(unsigned int *)&bitint.50 + 20B] = _15;
   .MULBITINT (&x, 129, &bitint.50, -129, &y, -129);
   return;

The change in corge is actually not really needed but hard to avoid, the change
in foo is right.
But one needs to do something similar also for all the other cases which aren't
using lower_mergeable_stmt, in particular
for the shift cases, casts from float to _BitInt,
multiplication/division/modulo, __builtin_{add,sub}_overflow,
__builtin_mul_overflow, etc.
E.g. for casts from float/decimal or multiplication/division/modulo (i.e. where
libgcc function is used), I guess best would be not to
pass TYPE_PRECISION but for bitint_extended for the loongarch/s390x case pass
CEIL (prec, limb_prec) * limb_prec and for the riscv/arm
case pass CEIL (prec, abi_limb_prec) * abi_limb_prec, i.e. always ask libgcc to
fill in all the needed limbs.

And, even on s390x/loongarch I'm worried (but haven't tried to construct
testcases) e.g. about the separate_ext cases in lower_mergeable_stmt; with the
x86/aarch64 cases of not extending anything, the higher bits are undefined, if
there is some widening extension around mergeable operation, say
unsigned _BitInt(692) x;

void
foo (_BitInt(282) y, _BitInt(282) z)
{
  x = y + z;
}
then we just compute as addition just the first 4 limbs (x86_64) in a loop and
then one limb special (just 26 bits from it, then sign extended) and then in a
separate loop store the -1 or 0 value in the higher limbs.
But on bitint_extended arches that should be done only until we reach the
precision of the wider type (i.e. 692 bits) and from there up to
type size everything else should be zeroed because the type being stored into
is unsigned, not signed.
For the other direction - narrower unsigned -> signed wider type that is not a
problem, because it is filling with zeros and so the most significant bit of
the non-padding is zero and so zeros should be stored in the padding bits.

Oh, and the above patch is most likely also wrong for bitint_bigend.

Reply via email to