On Thu, Dec 4, 2014 at 9:19 AM, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote: > > On 02/12/14 22:58, Ramana Radhakrishnan wrote: >> >> On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov <kyrylo.tkac...@arm.com> >> wrote: >>> >>> Hi all, >>> >>> This is the arm implementation of the macro fusion hook. >>> It tries to fuse movw+movt operations together. It also tries to take >>> lo_sum >>> RTXs into account since those generate movt instructions as well. >>> >>> Bootstrapped and tested on arm-none-linux-gnueabihf. >>> >>> Ok for trunk? >> >> >> >>> if (current_tune->fuseable_ops & ARM_FUSE_MOVW_MOVT) >>> + { >>> + /* We are trying to fuse >>> + movw imm / movt imm >>> + instructions as a group that gets scheduled together. */ >>> + >> >> A comment here about the insn structure would be useful. > > > Done. It's similar to the aarch64 adrp+add case. It does make it easier to > read, thanks. > > 2014-12-04 Kyrylo Tkachov kyrylo.tkac...@arm.com\ > > * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. > * config/arm/arm.c (arm_macro_fusion_p): New function. > (arm_macro_fusion_pair_p): Likewise. > (TARGET_SCHED_MACRO_FUSION_P): Define. > (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. > (ARM_FUSE_NOTHING): Likewise. > (ARM_FUSE_MOVW_MOVT): Likewise. > (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, > arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, > arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, > arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, > arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune > arm_cortex_a5_tune): Specify fuseable_ops value. > >> >>> + set_dest = SET_DEST (curr_set); >>> + if (GET_CODE (set_dest) == ZERO_EXTRACT) >>> + { >>> + if (CONST_INT_P (SET_SRC (curr_set)) >>> + && CONST_INT_P (SET_SRC (prev_set)) >>> + && REG_P (XEXP (set_dest, 0)) >>> + && REG_P (SET_DEST (prev_set)) >>> + && REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set))) >>> + return true; >>> + } >>> + else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM >>> + && REG_P (SET_DEST (curr_set)) >>> + && REG_P (SET_DEST (prev_set)) >>> + && GET_CODE (SET_SRC (prev_set)) == HIGH >>> + && REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST >>> (prev_set))) >>> + { >>> + return true; >>> + } >> >> Can we add a fast path exit to be >> >> if (GET_MODE (set_dest) != SImode) >> return false; > > > Done, but if/when we extend the function to handle more fusion cases it will > need to be > refactored, since we will want to just bail out of this MOVW+MOVT case > rather than the whole function.
Sure - > >> >> I did think whether we wanted to use reg_overlap_mentioned_p as that >> may simplify the logic a bit but that's overkill here as we still >> want to restrict it to the cases above. >> >> Otherwise OK. > > > Here's the updated patch. I've tested on arm-none-eabi and made sure that > the > fusion still happens on the benchmarks I looked at. > Ok? Ok - thanks, sorry about the slow response - been on vacation and still catching up. regards Ramana > > Thanks, > Kyrill > > >> >> Ramana >> >> >> >> >>> + } >>> + return false; >>> Thanks, >>> Kyrill >>> >>> 2014-11-11 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. >>> * config/arm/arm.c (arm_macro_fusion_p): New function. >>> (arm_macro_fusion_pair_p): Likewise. >>> (TARGET_SCHED_MACRO_FUSION_P): Define. >>> (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. >>> (ARM_FUSE_NOTHING): Likewise. >>> (ARM_FUSE_MOVW_MOVT): Likewise. >>> (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, >>> arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, >>> arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, >>> arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, >>> arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune >>> arm_cortex_a5_tune): Specify fuseable_ops value.