On Thu, Dec 4, 2014 at 9:19 AM, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote:
>
> On 02/12/14 22:58, Ramana Radhakrishnan wrote:
>>
>> On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov <kyrylo.tkac...@arm.com>
>> wrote:
>>>
>>> Hi all,
>>>
>>> This is the arm implementation of the macro fusion hook.
>>> It tries to fuse movw+movt operations together. It also tries to take
>>> lo_sum
>>> RTXs into account since those generate movt instructions as well.
>>>
>>> Bootstrapped and tested on arm-none-linux-gnueabihf.
>>>
>>> Ok for trunk?
>>
>>
>>
>>>   if (current_tune->fuseable_ops & ARM_FUSE_MOVW_MOVT)
>>> +    {
>>> +      /* We are trying to fuse
>>> +         movw imm / movt imm
>>> +         instructions as a group that gets scheduled together.  */
>>> +
>>
>> A comment here about the insn structure would be useful.
>
>
> Done. It's similar to the aarch64 adrp+add case. It does make it easier to
> read, thanks.
>
> 2014-12-04  Kyrylo Tkachov  kyrylo.tkac...@arm.com\
>
>       * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
>       * config/arm/arm.c (arm_macro_fusion_p): New function.
>       (arm_macro_fusion_pair_p): Likewise.
>       (TARGET_SCHED_MACRO_FUSION_P): Define.
>       (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
>       (ARM_FUSE_NOTHING): Likewise.
>       (ARM_FUSE_MOVW_MOVT): Likewise.
>       (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
>       arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
>       arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
>       arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
>       arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
>       arm_cortex_a5_tune): Specify fuseable_ops value.
>
>>
>>> +      set_dest = SET_DEST (curr_set);
>>> +      if (GET_CODE (set_dest) == ZERO_EXTRACT)
>>> +        {
>>> +          if (CONST_INT_P (SET_SRC (curr_set))
>>> +          && CONST_INT_P (SET_SRC (prev_set))
>>> +          && REG_P (XEXP (set_dest, 0))
>>> +          && REG_P (SET_DEST (prev_set))
>>> +          && REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
>>> +        return true;
>>> +        }
>>> +      else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
>>> +               && REG_P (SET_DEST (curr_set))
>>> +               && REG_P (SET_DEST (prev_set))
>>> +               && GET_CODE (SET_SRC (prev_set)) == HIGH
>>> +               && REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST
>>> (prev_set)))
>>> +        {
>>> +          return true;
>>> +        }
>>
>> Can we add a fast path exit to be
>>
>> if (GET_MODE (set_dest) != SImode)
>>    return false;
>
>
> Done, but if/when we extend the function to handle more fusion cases it will
> need to be
> refactored, since we will want to just bail out of this MOVW+MOVT case
> rather than the whole function.

Sure -

>
>>
>> I did think whether we wanted to use reg_overlap_mentioned_p as that
>> may simplify the logic a bit but that's  overkill here as we still
>> want to restrict it to the cases above.
>>
>> Otherwise OK.
>
>
> Here's the updated patch. I've tested on arm-none-eabi and made sure that
> the
> fusion still happens on the benchmarks I looked at.
> Ok?

Ok - thanks, sorry about the slow response - been on vacation and
still catching up.

regards
Ramana

>
> Thanks,
> Kyrill
>
>
>>
>> Ramana
>>
>>
>>
>>
>>> +    }
>>> +  return false;
>>> Thanks,
>>> Kyrill
>>>
>>> 2014-11-11  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>
>>>
>>>      * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
>>>      * config/arm/arm.c (arm_macro_fusion_p): New function.
>>>      (arm_macro_fusion_pair_p): Likewise.
>>>      (TARGET_SCHED_MACRO_FUSION_P): Define.
>>>      (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
>>>      (ARM_FUSE_NOTHING): Likewise.
>>>      (ARM_FUSE_MOVW_MOVT): Likewise.
>>>      (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
>>>      arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
>>>      arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
>>>      arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
>>>      arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
>>>      arm_cortex_a5_tune): Specify fuseable_ops value.

Reply via email to