Hello Kyrill, thanks for your comments.
2018-08-14 16:50 GMT+02:00 Kyrill Tkachov <kyrylo.tkac...@foss.arm.com>: > Hi Kai, > > > On 13/08/18 17:48, Kai Tietz wrote: >> >> I repost updated patch containing ChangeLog entry. >> >> Regards, >> Kai > > > I think I understand what this patch does, please correct me if I'm wrong. > You model the processors micro-ops and some A64 instructions use multiple > micro-ops. > This is what the falkor_variable_issue attribute specifies. > In TARGET_SCHED_VARIABLE_ISSUE you count the issue slots that the micro-ops > take and how much "space" is left over, which is stored in leftover_uops > and you use leftover_uops in TARGET_SCHED_REORDER to tell the scheduler how > many more micro-ops it can issue on that cycle. Correct. > And with that change the issue_rate is no longer the *instruction* issue > rate, but rather the *micro-op* issue rate. > Overall this looks very similar to the implementation of this functionality > in the powerpc port (rs6000). > Is this correct? Yes, it is somewhat similar to the rs6000 variant. > I have a few comments on the implementation inline... > > Jim Wilson<jim.wil...@linaro.org> > Kai Tietz<kai.ti...@linaro.org> > > * config/aarch64.c (aarch64_sched_reorder): Implement > TARGET_SCHED_REORDER hook. > (aarch64_variable_issue): Implement TARGET_SCHED_VARIABLE_ISSUE > hook. > (TARGET_SCHED_REORDER): Define. > (TARGET_SCHED_VARIABLE_ISSUE): Likewise. > * config/aarch64/falkor.md (falkor_variable_issue): New. > > Index: aarch64/aarch64.c > =================================================================== > --- aarch64.orig/aarch64.c > +++ aarch64/aarch64.c > @@ -914,7 +914,7 @@ static const struct tune_params qdf24xx_ > &generic_branch_cost, > &generic_approx_modes, > 4, /* memmov_cost */ > - 4, /* issue_rate */ > + 8, /* issue_rate */ > (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD > | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops */ > "16", /* function_align. */ > @@ -17551,6 +17551,105 @@ aarch64_run_selftests (void) > #endif /* #if CHECKING_P */ > +/* The number of micro ops left over after issuing the last instruction in > a > + cycle. This is subtracted from the next cycle before we start issuing > insns. > + This is initialized to 0 at the start of every basic block. */ > +static int leftover_uops = 0; > + > > I believe the scheduler provides hooks specifically for storing > backend-specific scheduling state so we should > avoid creating such static variables in aarch64.c. Can you use the > TARGET_SCHED_*_SCHED_CONTEXT family of hooks here? > Then it will be up to the scheduler midend to keep track of the state and > between basic blocks, functions etc. I think you refer to the ppc implementation. But if you take a closer look to it, you will see that nevertheless such an implementation will require global variables. So I am not really sure it is worth to introduce the ..._SCHED_CONTEXT API to avoid one global variable by introducing at least two others. Neverthelss I am admit that making use of SCHED_CONTEXT could be a general nice to have, but not necessarily a gain in that case. > +/* Implement TARGET_SCHED_REORDER. */ > + > +static int > +aarch64_sched_reorder (FILE *file, int verbose, > + rtx_insn **ready ATTRIBUTE_UNUSED, > + int *n_readyp ATTRIBUTE_UNUSED, > + int clock) > +{ > + int can_issue_more = aarch64_sched_issue_rate (); > + > + if ((enum attr_tune) aarch64_tune == TUNE_FALKOR) > + { > + /* The start of a basic block. */ > + if (clock == 0) > + { > + if (leftover_uops && file && (verbose > 3)) > + fprintf (file, ";;\tLeftover uops ignored at bb start.\n"); > + > + leftover_uops = 0; > + } > + > + /* Account for issue slots left over from previous cycle. This value > + can be larger than the number of issue slots per cycle, so we need > + to check it here before scheduling any instructions. */ > + else if (leftover_uops) > + { > + can_issue_more -= leftover_uops; > + > + if (file && (verbose > 3)) > + { > + fprintf (file, ";;\tUse %d issue slots for leftover uops.\n", > + leftover_uops); > + fprintf (file, ";;\t%d issue slots left.\n", can_issue_more); > + } > + > + leftover_uops = 0; > + > + if (can_issue_more < 0) > + { > + leftover_uops = 0 - can_issue_more; > + can_issue_more = 0; > + > + if (file && (verbose > 3)) > + { > + fprintf (file, ";;skipping issue cycle.\n"); > + fprintf (file, ";;\t%d uops left over.\n", leftover_uops); > + } > + } > + } > + } > + > + return can_issue_more; > +} > + > +/* Implement TARGET_SCHED_VARIABLE_ISSUE. */ > + > > A comment here like you have for TARGET_SCHED_REORDER describing what this > function accomplishes would be very helpful. Ok. As it is a simple implementation of a well described hook, it didn't seemed to me that it would need more comment. but I am open for improving it. > +static int > +aarch64_variable_issue (FILE *file, int verbose, > + rtx_insn *insn, int more) > +{ > + if (GET_CODE (PATTERN (insn)) != USE > + && GET_CODE (PATTERN (insn)) != CLOBBER) > + { > + if ((enum attr_tune) aarch64_tune != TUNE_FALKOR) > + more -= 1; > + else > + { > + int issue_slots = get_attr_falkor_variable_issue (insn); > + more -= issue_slots; > + > > > We generally try to avoid having explicit CPU-specific checks like this in > the aarch64 backend. > Instead we try to keep all the CPU-specific tuning information in the CPU > tuning structures. > > This particular infrastructure looks like it could be used for other CPUs in > the future. In order for that to happen we don't want > to have a check of aarch64_tune, but rather a tuning flag defined in > aarch64-tuning-flags.def that tells us whether we're scheduling > considering micro-ops or not. It can be on by default in the falkor tuning > struct. Ok, this is fair. I will add such a tuning-flags.def indicating scheduling for micro-ops/or not. > Then, we'd want the falkor_variable_issue attribute to be a generic > attribute that specifies the number of micro-ops per-instruction and > per-cpu. Ok, so I will rename hook to a generic name instead. > Unfortunately I'm don't see how this could be done in the current RTL > attribute infrastructure without creating a big unwieldy attribute that will > be hard to keep > up to date :( Ideally we'd want something that could be specified in each > CPU's .md file. Any ideas? Sadly no. I have thought about this too. The approach via such a 'big' attribute looks long-term hard to handle. As you mentioned the maintaining for such a beast will be gross. Therefore I descided for now to address it in an cpu-flavor specific way. > Thanks, > Kyrill > Thanks, Kai