[Bug target/120920] RISC-V: Possible optimization of bswap when zbb is enabled

Dusan.Stojkovic--- via Gcc-bugs Thu, 17 Jul 2025 08:01:20 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120920


--- Comment #3 from Dusan Stojkovic <dusan.stojko...@rt-rk.com> ---
Based on Jeff's review, I updated the patch:
https://patchwork.sourceware.org/project/gcc/patch/pr3pr08mb5738402789a50779af3ae0abbe...@pr3pr08mb5738.eurprd08.prod.outlook.com/

>> A possibility for improvement for rv64 could be:
>> ```
>> bswap8:
>>         rev8 t0, a0
>>         srli t1, a0, 16
>>         srai t0, t0, 48
>>         slli t1, t1, 16
>>         or   a0, t1, t0
>>         ret
>> ```

>> I suspect you'll need to look at a simplify-rtx simplification of some kind.

I ended up implementing a peephole pattern. It turned out that creating a 
pattern which matches the entire bswap8 sequence here would be 
quite large. So the resulting assembly ended up being:

bswap8:
        rev8    a5,a0
        li      a4,-65536
        srli    a5,a5,48
        and     a0,a0,a4
        or      a0,a0,a5
        sext.w  a0,a0
        ret 

--------------------------------------------------------------------------------

Interestingly, even though 0..7 and 8..15 are swapped, in the bswap pass it is
treated as a bswap32:

32 bit bswap implementation found at: _10 = _3 | _6;
unsigned int bswap8 (unsigned int n)
{
  unsigned int _1;
  unsigned int _2;
  unsigned int _3;
  unsigned int bswapdst_4;
  unsigned int _5;
  unsigned int _6;
  unsigned int _8;
  unsigned int _10;
  unsigned int bswapmaskdst_11;

  <bb 2> [local count: 1073741824]:
  _1 = n_7(D) & 4294901760;
  _2 = n_7(D) >> 8;
  _3 = _2 & 255;
  _5 = n_7(D) << 8;
  _6 = _5 & 65535;
  bswapdst_4 = __builtin_bswap32 (n_7(D));
  bswapmaskdst_11 = bswapdst_4 & 4294901760;
  _10 = bswapmaskdst_11 r<< 16;
  _8 = _1 | _10;
  return _8;

}

I narrowed it down to the attached diff.
The resulting function with the applied patch becomes:
bswap8:
        rev8    a5,a0
        li      a4,-65536
        srli    a5,a5,48
        and     a0,a0,a4
        or      a0,a5,a0
        ret

With the GIMPLE in the bswap pass becoming:

...
16 bit bswap implementation found at: _10 = _3 | _6;
unsigned int bswap8 (unsigned int val)
{
  unsigned int _1;
  unsigned int _2;
  unsigned int _3;
  short unsigned int bswapsrc_4;
  unsigned int _5;
  unsigned int _6;
  unsigned int _8;
  unsigned int _10;
  short unsigned int bswapdst_11;

  <bb 2> [local count: 1073741824]:
  _1 = val_7(D) & 4294901760;
  _2 = val_7(D) >> 8;
  _3 = _2 & 255;
  _5 = val_7(D) << 8;
  _6 = _5 & 65535;
  bswapsrc_4 = (short unsigned int) val_7(D);
  bswapdst_11 = bswapsrc_4 r>> 8;
  _10 = (unsigned int) bswapdst_11;
  _8 = _1 | _10;
  return _8;

}

I'm not sure how to approach this further in the bswap pass, though, since the
attached diff produces regressions on x86 (see PR 115102)... 

I just wanted to document it here for future reference...

[Bug target/120920] RISC-V: Possible optimization of bswap when zbb is enabled

Reply via email to