On 2023/01/08 6:53, Max Filippov wrote:
> On Fri, Jan 6, 2023 at 6:55 PM Takayuki 'January June' Suwa
> <jjsuwa_sys3...@yahoo.co.jp> wrote:
>>
>> This patch optimizes the operation of cutting and splicing two register
>> values at a specified bit position, in other words, combining (bitwise
>> ORing) bits 0 through (C-1) of the register with bits C through 31
>> of the other, where C is the specified immediate integer 1 through 31.
>>
>> This typically applies to signedness copy of floating point number or
>> __builtin_return_address() if the windowed register ABI, and saves one
>> instruction compared to four shifts and a bitwise OR by the RTL
>> generation pass.
> 
> While I indeed see this kind of change, e.g.:
> -       extui   a3, a3, 27, 5
> -       slli    a2, a2, 5
> -       srli    a2, a2, 5
> -       slli    a3, a3, 27
> -       or      a2, a2, a3
> +       slli    a2, a2, 5
> +       extui   a3, a3, 27, 5
> +       ssai    5
> +       src     a2, a3, a2
> 
> I also see the following:
> -       movi.n  a6, -4
> -       and     a5, a5, a6
> -       extui   a3, a3, 0, 2
> -       or      a3, a3, a5
> +       srli    a5, a5, 2
> +       slli    a3, a3, 30
> +       ssai    30
> +       src     a3, a5, a3
> 
> i.e. after the split there's the same number of instructions,
> but the new sequence is one byte longer than the original one
> because of the movi.n.
> 
> Looking at a bunch of linux builds I observe a slight code size
> growth in call0 kernels and a slight code size reduction in
> windowed kernels.
> 
>> gcc/ChangeLog:
>>
>>         * config/xtensa/xtensa.md (*splice_bits):
>>         New insn_and_split pattern.
>> ---
>>  gcc/config/xtensa/xtensa.md | 47 +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
>> index 0a26d3dccf4..36ec1b1918e 100644
>> --- a/gcc/config/xtensa/xtensa.md
>> +++ b/gcc/config/xtensa/xtensa.md
>> @@ -746,6 +746,53 @@
>>     (set_attr "mode"    "SI")
>>     (set_attr "length"  "3")])
>>
>> +(define_insn_and_split "*splice_bits"
>> +  [(set (match_operand:SI 0 "register_operand" "=a")
>> +       (ior:SI (and:SI (match_operand:SI 1 "register_operand" "r")
>> +                       (match_operand:SI 3 "const_int_operand" "i"))
>> +               (and:SI (match_operand:SI 2 "register_operand" "r")
>> +                       (match_operand:SI 4 "const_int_operand" "i"))))]
>> +
>> +  "!optimize_debug && optimize
>> +   && INTVAL (operands[3]) + INTVAL (operands[4]) == -1
>> +   && (exact_log2 (INTVAL (operands[3]) + 1) > 0
>> +       || exact_log2 (INTVAL (operands[4]) + 1) > 0)"
>> +  "#"
>> +  "&& can_create_pseudo_p ()"
>> +  [(set (match_dup 5)
>> +       (ashift:SI (match_dup 1)
>> +                  (match_dup 4)))
>> +   (set (match_dup 6)
>> +       (lshiftrt:SI (match_dup 2)
>> +                    (match_dup 3)))
>> +   (set (match_dup 0)
>> +       (ior:SI (lshiftrt:SI (match_dup 5)
>> +                            (match_dup 4))
>> +               (ashift:SI (match_dup 6)
>> +                          (match_dup 3))))]
>> +{
>> +  int shift;
>> +  if (INTVAL (operands[3]) < 0)
>> +    {
>> +      rtx x;
>> +      x = operands[1], operands[1] = operands[2], operands[2] = x;
>> +      x = operands[3], operands[3] = operands[4], operands[4] = x;
>> +    }
>> +  shift = floor_log2 (INTVAL (operands[3]) + 1);
>> +  operands[3] = GEN_INT (shift);
>> +  operands[4] = GEN_INT (32 - shift);
>> +  operands[5] = gen_reg_rtx (SImode);
>> +  operands[6] = gen_reg_rtx (SImode);
>> +}
>> +  [(set_attr "type"    "arith")
>> +   (set_attr "mode"    "SI")
>> +   (set (attr "length")
>> +       (if_then_else (match_test "TARGET_DENSITY
>> +                                  && (INTVAL (operands[3]) == 0x7FFFFFFF
>> +                                      || INTVAL (operands[4]) == 
>> 0x7FFFFFFF)")
>> +                     (const_int 11)
>> +                     (const_int 12)))])
> 
> I wonder how the length could be 11 here? I always see 4 3-byte
> instructions generated by this pattern.
> 

Sorry, I should have carried out a systematic test beforehand:

    #define TEST(c) \
      unsigned int test_ ## c (unsigned int a, unsigned int b) { \
        return (a & (-1U >> c)) | (b & ~(-1U >> c)); \
      }
    TEST(1)
    TEST(2)
      ...
    TEST(30)
    TEST(31)

Without this patch, compiling the above if c is:

 a. between 1 and 15, slli (or add.n) + extui + slli + srli + or
 b. 16 then extui + slli + extui + or
 c. between 17 and 20, srli + slli + extui + or 
 d. between 21 and 31, movi(.n) + and + extui + or

Clearly, the patch should be restricted to apply only to case a.

Reply via email to