On 07/02/2020 02:12, Modi Mo via gcc-patches wrote:
I did a quick bootstrap, this shows several failures like:

gcc/builtins.c:9427:1: error: unrecognizable insn:
  9427 | }
       | ^
(insn 212 211 213 24 (set (reg:SI 207)
         (zero_extract:SI (reg:SI 206)
             (const_int 26 [0x1a])
             (const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
      (nil))

The issue here is that 26+6 = 32 and that's not a valid ubfx encoding.
Currently cases like this are split into a right shift in aarch64.md around line
5569:

Appreciate you taking a look and the validation. I've gotten access to an 
aarch64 server and the bootstrap demonstrated the issue you saw. This was 
caused by my re-definition of the pattern to:
+  if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
+    FAIL;

Which meant for SImode only a sum of >32 bit actually triggers the fail condition 
for the define_expand whereas the existing define_insn fails on >=32 bit. I looked 
into the architecture reference manual and the bits are available for ubfx/sbfx for 
that type of encoding and the documentation says you can use [lsb, 32-lsb] for SImode 
as a legal pair. Checking with the GNU assembler it does accept a sum of 32 but 
transforms it into a LSR:

Assembly file:
ubfx    w0, w0, 24, 8

Disassembly of section .text:

0000000000000000 <f>:
    0:   53187c00        lsr     w0, w0, #24

This is how LSR is implemented in AArch64, as an extract. So the disassembler is showing the canonical representation.

However, inside the compiler we really want to represent this as a shift. Why? Because there are cases where we can then merge a shift into other operations when we can't merge the more general extract operation. For example, we can merge an LSR into a subsequent 'AND' operation, but can't merge a more general extract into an AND. Essentially, we want the compiler to describe this in the canonical shift form rather than the extract form.

Ideally this would be handled inside the mid-end expansion of an extract, but in the absence of that I think this is best done inside the extv expansion so that we never end up with a real extract in that case.



Similarly with the 64 bit version it'll become a 64 bit LSR. Certainly other 
assemblers could trip over, I've attached a new patch that allows this encoding 
and bootstrap + testing c/c++ testsuite looks good. I'll defer to you if it's 
better to explicitly do the transformation in GCC.

;; When the bit position and width add up to 32 we can use a W-reg LSR ;;
instruction taking advantage of the implicit zero-extension of the X-reg.
(define_split
   [(set (match_operand:DI 0 "register_operand")
         (zero_extract:DI (match_operand:DI 1 "register_operand")
                          (match_operand 2
                            "aarch64_simd_shift_imm_offset_di")
                          (match_operand 3
                            "aarch64_simd_shift_imm_di")))]
   "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
              GET_MODE_BITSIZE (DImode) - 1)
    && (INTVAL (operands[2]) + INTVAL (operands[3]))
        == GET_MODE_BITSIZE (SImode)"
   [(set (match_dup 0)
         (zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
   {
     operands[4] = gen_lowpart (SImode, operands[1]);
   }

However that only supports DImode, not SImode, so it needs to be changed
to be more general using GPI.

Your new extv patterns should replace the magic patterns above it:

With the previous discovery that a sum of 32/64 will trigger LSR in the 
assembler I was curious what would happen if I remove this pattern. Turns out, 
you will end up with a UBFX x0, x0, 24, 8 compared to a LSR w0, w0, 24 in the 
test case associated with this change (gcc.target/aarch64/ubfx_lsr_1.c) which 
doesn't get transformed into an LSR by the assembler since it's in 64 bit mode. 
So this pattern still has value but I don't think it's necessary to extend it 
to DI since that'll automatically get turned into a LSR by the assembler as I 
previously mentioned.


;; -------------------------------------------------------------------
;; Bitfields
;; -------------------------------------------------------------------

(define_expand "<optab>"

These are the current extv/extzv patterns, but without a mode. They are no
longer used when we start using the new ones.

Note you can write <optab><mode> to combine the extzv and extz patterns.
But please add a comment mentioning the pattern names so they are easy to
find!

Good call here, made this change in the new patch. I did see the define_insn of these 
guys below it but somehow missed that they were expanded just above. I believe, from 
my understanding of GCC, that the matching pattern below the first line is what 
constrains <optab> into just extv/extsv from the long list of iterators it 
belongs to. Still, I see that there's constrained iterators elsewhere like:

;; Optab prefix for sign/zero-extending operations
(define_code_attr su_optab [(sign_extend "") (zero_extend "u")

I added a comment in this patch before the pattern. Thoughts on defining 
another constrained version to make it clearer (in addition or in lieu of the 
comment)?

Besides a bootstrap it is always useful to compile a large body of code with
your change (say SPEC2006/2017) and check for differences in at least
codesize. If there is an increase in instruction count then there may be more
issues that need to be resolved.

Sounds good. I'll get those setup and running and will report back on findings. 
What's the preferred way to measure codesize? I'm assuming by default the code 
pages are aligned so smaller differences would need to trip over the boundary 
to actually show up.
I find it easiest to develop on a many-core AArch64 server so you get much
faster builds, bootstraps and regression tests. Cross compilers are mostly
useful if you want to test big-endian or new architecture features which are
not yet supported in hardware. You don't normally need to test
Go/Fortran/ADA etc unless your patch does something that would directly
affect them.

Good to know. Sounds like our current testing with C/C++ testsuite is alright 
for most changes.

Finally do you have a copyright assignment with the FSF?

Lawyers are working on that. Since I have SPEC codesize to go through the 
licensing may be ready before all analysis is done for the patch.

Modi


R.

Reply via email to