On 01/12/15 03:19, Bin.Cheng wrote:
> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
> <richard.earns...@foss.arm.com> wrote:
>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>> On 24/11/15 02:51, Bin.Cheng wrote:
>>>>>> The aarch64's problem is we don't define addptr3 pattern, and we don't
>>>>>>>> have direct insn pattern describing the "x + y << z".  According to
>>>>>>>> gcc internal:
>>>>>>>>
>>>>>>>> ‘addptrm3’
>>>>>>>> Like addm3 but is guaranteed to only be used for address calculations.
>>>>>>>> The expanded code is not allowed to clobber the condition code. It
>>>>>>>> only needs to be defined if addm3 sets the condition code.
>>>>>>
>>>>>> addm3 on aarch64 does not set the condition codes, so by this rule we
>>>>>> shouldn't need to define this pattern.
>>>> Hi Richard,
>>>> I think that rule has a prerequisite that backend needs to support
>>>> register shifted addition in addm3 pattern.
>>>
>>> addm3 is a named pattern and its format is well defined.  It does not
>>> take a shifted operand and never has.
>>>
>>>> Apparently for AArch64,
>>>> addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
>>>> "does not set the condition codes" actually, because both
>>>> "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>
>>> You appear to be confusing named patterns (used by expand) with
>>> recognizers.  Anyway, we have
>>>
>>> (define_insn "*add_<shift>_<mode>"
>>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>>         (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>>                               (match_operand:QI 2
>>> "aarch64_shift_imm_<mode>" "n"))
>>>                   (match_operand:GPI 3 "register_operand" "r")))]
>>>
>>> Which is a non-flag setting add with shifted operand.
>>>
>>>> Either way I think it is another backend issue, so do you approve that
>>>> I commit this patch now?
>>>
>>> Not yet.  I think there's something fundamental amiss here.
>>>
>>> BTW, it looks to me as though addptr<m>3 should have exactly the same
>>> operand rules as add<m>3 (documentation reads "like add<m>3"), so a
>>> shifted operand shouldn't be supported there either.  If that isn't the
>>> case then that should be clearly called out in the documentation.
>>>
>>> R.
>>>
>>
>> PS.
>>
>> I presume you are aware of the canonicalization rules for add?  That is,
>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>
>> (plus (shift (op, op)), op)
>>
>> not
>>
>> (plus (op, (shift (op, op))
> 
> Hi Richard,
> Thanks for the comments.  I realized that the not-recognized insn
> issue is because the original patch build non-canonical expressions.
> When reloading address expression, LRA generates non-canonical
> register scaled insn, which can't be recognized by aarch64 backend.
> 
> Here is the updated patch using canonical form pattern,  it passes
> bootstrap and regression test.  Well, the ivo failure still exists,
> but it analyzed in the original message.
> 
> Is this patch OK?
> 
> As for Jiong's concern about the additional extension instruction, I
> think this only stands for atmoic load store instructions.  For
> general load store, AArch64 supports zext/sext in register scaling
> addressing mode, the additional instruction can be forward propagated
> into memory reference.  The problem for atomic load store is AArch64
> only supports direct register addressing mode.  After LRA reloads
> address expression out of memory reference, there is no combine/fwprop
> optimizer to merge instructions.  The problem is atomic_store's
> predicate doesn't match its constraint.   The predicate used for
> atomic_store<mode> is memory_operand, while all other atomic patterns
> use aarch64_sync_memory_operand.  I think this might be a typo.  With
> this change, expand will not generate addressing mode requiring reload
> anymore.  I will test another patch fixing this.
> 
> Thanks,
> bin

Some comments inline.

>>
>> R.
>>
>> aarch64_legitimize_addr-20151128.txt
>>
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 3fe2f0f..5b3e3c4 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
>> */, machine_mode mode)
>>       We try to pick as large a range for the offset as possible to
>>       maximize the chance of a CSE.  However, for aligned addresses
>>       we limit the range to 4k so that structures with different sized
>> -     elements are likely to use the same base.  */
>> +     elements are likely to use the same base.  We need to be careful
>> +     not split CONST for some forms address expressions, otherwise it

not to split a CONST for some forms of address expression,

>> +     will generate sub-optimal code.  */
>>  
>>    if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>      {
>>        HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>        HOST_WIDE_INT base_offset;
>>  
>> +      if (GET_CODE (XEXP (x, 0)) == PLUS)
>> +    {
>> +      rtx op0 = XEXP (XEXP (x, 0), 0);
>> +      rtx op1 = XEXP (XEXP (x, 0), 1);
>> +
>> +      /* For addr expression in the form like "r1 + r2 + 0x3ffc".
>> +         Since the offset is within range supported by addressing
>> +         mode "reg+offset", we don't split the const and legalize
>> +         it into below insn and expr sequence:
>> +           r3 = r1 + r2;
>> +           "r3 + 0x3ffc".  */

I think this comment would read better as

        /* Address expressions of the form Ra + Rb + CONST.

           If CONST is within the range supported by the addressing
           mode "reg+offset", do not split CONST and use the
           sequence
                Rt = Ra + Rb
                addr = Rt + CONST.  */

>> +      if (REG_P (op0) && REG_P (op1))
>> +        {
>> +          machine_mode addr_mode = GET_MODE (x);
>> +          rtx base = gen_reg_rtx (addr_mode);
>> +          rtx addr = plus_constant (addr_mode, base, offset);
>> +
>> +          if (aarch64_legitimate_address_hook_p (mode, addr, false))
>> +            {
>> +              emit_insn (gen_adddi3 (base, op0, op1));
>> +              return addr;
>> +            }
>> +        }
>> +      /* For addr expression in the form like "r1 + r2<<2 + 0x3ffc".
>> +         Live above, we don't split the const and legalize it into
>> +         below insn and expr sequence:

Similarly.
>> +           r3 = 0x3ffc;
>> +           r4 = r1 + r3;
>> +           "r4 + r2<<2".  */

Why don't we generate

  r3 = r1 + r2 << 2
  r4 = r3 + 0x3ffc

utilizing the shift-and-add instructions?

>> +      else if (REG_P (op0) || REG_P (op1))
>> +        {
>> +          machine_mode addr_mode = GET_MODE (x);
>> +          rtx base = gen_reg_rtx (addr_mode);
>> +
>> +          /* Switch to make sure that register is in op0.  */
>> +          if (REG_P (op1))
>> +            std::swap (op0, op1);
>> +
>> +          rtx addr = gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
Use gen_rtx_PLUS.

>> +
>> +          if (aarch64_legitimate_address_hook_p (mode, addr, false))
>> +            {
>> +              base = force_operand (plus_constant (addr_mode,
>> +                                                   op0, offset),
>> +                                    NULL_RTX);
>> +              return gen_rtx_fmt_ee (PLUS, addr_mode, op1, base);
Likewise.

>> +            }
>> +        }
>> +    }
>> +
>>        /* Does it look like we'll need a load/store-pair operation?  */
>>        if (GET_MODE_SIZE (mode) > 16
>>        || mode == TImode)

R.

Reply via email to