Hi! Since bswap has been converted from a method to a function we miscompile the following testcase. The problem is the assumption that the passed in len argument (number of limbs in the xval array) is the upper bound for the bswap result, which is true only if precision is <= 64. If precision is larger than that, e.g. 128 as in the testcase, if the argument has only one limb (i.e. 0 to ~(unsigned HOST_WIDE_INT) 0), the result can still need 2 limbs for that precision, or generally BLOCKS_NEEDED (precision) limbs, it all depends on how many least significant limbs of the operand are zero. bswap_large as implemented only cleared len limbs of result, then swapped the bytes (invoking UB when oring something in all the limbs above it) and finally passed len to canonize, saying that more limbs aren't needed.
The following patch fixes it by renaming len to xlen (so that it is clear it is X's length), using it solely for safe_uhwi argument when we attempt to read from X, and using new len = BLOCKS_NEEDED (precision) instead in the other two spots (i.e. when clearing the val array, turned it also into memset, and in canonize argument). wi::bswap asserts it isn't invoked on widest_int, so we are always invoked on wide_int or similar and those have preallocated result sized for the corresponding precision (i.e. BLOCKS_NEEDED (precision)). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-02-03 Jakub Jelinek <ja...@redhat.com> PR middle-end/113722 * wide-int.cc (wi::bswap_large): Rename third argument from len to xlen and adjust use in safe_uhwi. Add len variable, set it to BLOCKS_NEEDED (precision) and use it for clearing of val and as canonize argument. Clear val using memset instead of a loop. * gcc.dg/pr113722.c: New test. --- gcc/wide-int.cc.jj 2024-01-03 11:51:42.077584823 +0100 +++ gcc/wide-int.cc 2024-02-02 18:13:34.993332159 +0100 @@ -729,20 +729,19 @@ wi::set_bit_large (HOST_WIDE_INT *val, c } } -/* Byte swap the integer represented by XVAL and LEN into VAL. Return +/* Byte swap the integer represented by XVAL and XLEN into VAL. Return the number of blocks in VAL. Both XVAL and VAL have PRECISION bits. */ unsigned int wi::bswap_large (HOST_WIDE_INT *val, const HOST_WIDE_INT *xval, - unsigned int len, unsigned int precision) + unsigned int xlen, unsigned int precision) { - unsigned int i, s; + unsigned int s, len = BLOCKS_NEEDED (precision); /* This is not a well defined operation if the precision is not a multiple of 8. */ gcc_assert ((precision & 0x7) == 0); - for (i = 0; i < len; i++) - val[i] = 0; + memset (val, 0, sizeof (unsigned HOST_WIDE_INT) * len); /* Only swap the bytes that are not the padding. */ for (s = 0; s < precision; s += 8) @@ -753,7 +752,7 @@ wi::bswap_large (HOST_WIDE_INT *val, con unsigned int block = s / HOST_BITS_PER_WIDE_INT; unsigned int offset = s & (HOST_BITS_PER_WIDE_INT - 1); - byte = (safe_uhwi (xval, len, block) >> offset) & 0xff; + byte = (safe_uhwi (xval, xlen, block) >> offset) & 0xff; block = d / HOST_BITS_PER_WIDE_INT; offset = d & (HOST_BITS_PER_WIDE_INT - 1); --- gcc/testsuite/gcc.dg/pr113722.c.jj 2024-02-02 18:25:22.702561427 +0100 +++ gcc/testsuite/gcc.dg/pr113722.c 2024-02-02 18:21:00.109186858 +0100 @@ -0,0 +1,22 @@ +/* PR middle-end/113722 */ +/* { dg-do run { target int128 } } */ +/* { dg-options "-O2" } */ + +int +main () +{ + unsigned __int128 a = __builtin_bswap128 ((unsigned __int128) 2); + if (a != ((unsigned __int128) 2) << 120) + __builtin_abort (); + a = __builtin_bswap128 ((unsigned __int128) 0xdeadbeefULL); + if (a != ((unsigned __int128) 0xefbeaddeULL) << 96) + __builtin_abort (); + a = __builtin_bswap128 (((unsigned __int128) 0xdeadbeefULL) << 64); + if (a != ((unsigned __int128) 0xefbeaddeULL) << 32) + __builtin_abort (); + a = __builtin_bswap128 ((((unsigned __int128) 0xdeadbeefULL) << 64) + | 0xcafed00dfeedbac1ULL); + if (a != ((((unsigned __int128) 0xc1baedfe0dd0fecaULL) << 64) + | (((unsigned __int128) 0xefbeaddeULL) << 32))) + __builtin_abort (); +} Jakub