> On 19 Aug 2017, at 4:10 AM, Richard Henderson <r...@twiddle.net> wrote:
> 
> On 08/17/2017 03:29 PM, Michael Clark wrote:
>> hand coded x86 asm (no worse because the sar depends on the lea)
>> 
>>      sx5(int):
>>        shl edi, 27
>>        sar edi, 27
>>        movsx eax, dl
> 
> Typo in the register, but I know what you mean.  More interestingly, edi
> already has the sign-extended value, so "mov eax, edi" sufficies (saving one
> byte or perhaps allowing better register allocation).
> 
> That said, if anyone is tweaking x86-64 patterns on this, consider
> 
> sx5(int):
>       rorx    eax, edi, 5
>       sar     eax, 27
> 
> where the (newish) bmi2 rorx instruction allows the data to be moved into 
> place
> while rotating, avoiding the extra move entirely.

The patterns might be hard to match unless we can get the gimple expansions for 
bitfield access to use the mode of the enclosing type.

For bitfield accesses to SI mode fields under 8 bits on RISC-V, gcc is 
generating two shifts on QI mode sub-registers, each with a sign-extend.

For bitfield accesses to SI mode fields under 16 bits on RISC-V, gcc is 
generating two shifts on HI mode sub-registers, each with a sign-extend.

The bitfield expansion logic is selecting the narrowest type that can hold the 
bitwidth of the field, versus the bitwidth of the enclosing type and this 
appears to be in the gimple to rtl transform, possibly in expr.c.

Using the mode of the enclosing type for bitfield member access expansion would 
likely solve this problem. I suspect that with the use of caches and word sized 
register accesses, that this sort of change would be reasonable to make for 
32-bit and 64-bit targets.

I also noticed something I didn’t spot earlier. On RV32 the sign extend and 
shifts are correctly coalesced into 27 bit shifts in the combine stage. We are 
presently looking into another redundant sign extension issue on RV64 that 
could potentially be related. It could be that the shift coalescing 
optimisation doesn’t happen unless the redundant sign extensions are eliminated 
early in combine by simplify_rtx. i.e. the pattern is more complex due to 
sign_extend ops that are not eliminated.

- https://cx.rv8.io/g/2FxpNw

RV64 and Aarch64 both appear to have the issue but with different expansions 
for the shift and extend pattern due to the mode of access (QI or HI). Field 
accesses above 16 bits create SI mode accesses and generate the expected code. 
The RISC-V compiler has the #define SLOW_BYTE_ACCESS 1 patch however it appears 
to make no difference in this case. SLOW_BYTE_ACCESS suppresses QI mode and HI 
mode loads in some bitfield test cases when a struct is passed by pointer but 
has no effect on this particular issue. This shows the codegen for the fix to 
the SLOW_BYTE_ACCESS issue. i.e. proof that the compiler as SLOW_BYTE_ACCESS 
defined to 1.

- https://cx.rv8.io/g/TyXnoG

A target independent fix that would solve the issue on ARM and RISC-V would be 
to access bitfield members with the mode of the bitfield member's enclosing 
type instead of the smallest mode that can hold the bitwidth of the type. If we 
had a char or short member in the struct, I can understand the use of QI and HI 
mode, as we would need narrorwer accesses due to alignment issues, but in this 
case the member is in int and one would expect this to expand to SI mode 
accesses if the enclosing type is SI mode.


riscv64:

        sx5(int):
          slliw a0,a0,3
          slliw a0,a0,24
          sraiw a0,a0,24
          sraiw a0,a0,3
          ret

        sx17(int):
          slliw a0,a0,15
          sraiw a0,a0,15
          ret

riscv32:

        sx5(int):
          slli a0,a0,27
          srai a0,a0,27
          ret

        sx17(int):
          slli a0,a0,15
          srai a0,a0,15
          ret

aarch64:

        sx5(int):
          sbfiz w0, w0, 3, 5
          asr w0, w0, 3
          ret

        sx17(int):
          sbfx w0, w0, 0, 17
          ret


Reply via email to