On 19/01/15 18:14, Maxim Kuvyrkov wrote: > On Jan 19, 2015, at 6:05 PM, Richard Earnshaw <rearn...@arm.com> wrote: > >> On 16/01/15 15:06, Maxim Kuvyrkov wrote: >>> @@ -1874,7 +1889,8 @@ const struct tune_params arm_cortex_a15_tune = >>> true, true, /* Prefer 32-bit encodings. >>> */ >>> true, /* Prefer Neon for >>> stringops. */ >>> 8, /* Maximum insns to >>> inline memset. */ >>> - ARM_FUSE_NOTHING /* Fuseable pairs of >>> instructions. */ >>> + ARM_FUSE_NOTHING, /* Fuseable pairs of >>> instructions. */ >>> + max_insn_queue_index + 1 /* Sched L2 autopref depth. */ >>> }; >> >> >> Hmm, two issues here: >> 1) This requires a static constructor for the tuning table entry (since >> the value of max_insn_queue_index has to be looked up at run time. > > Are you sure? I didn't check the object files, but, since > max_insn_queue_index is a "const int", I would expect a relocation that would > be resolved at link time, not a constructor. >
Yes, I'm sure. Relocations can only resolve addresses of objects, not their contents. LTO might eliminate the need for the reloc, but otherwise the compiler will never see the definition and will need to create a static constructor. > Is it a problem to have a static constructor for the tables? Needing constructors means that the compiler can't put the object into read-only sections of the image. It's not a huge problem, but if there are ways by which they can be avoided, that's likely to be preferable; there's a small run-time overhead to running them. > >> >> 2) Doesn't this mean that the depth of searching will depend on >> properties of the automata rather than some machine specific values (so >> that potentially adding or removing unrelated scheduler rules could >> change the behaviour of the compiler)? > > No. The extra queue entries that will appear from extending an unrelated > automaton will be empty, so the search will check them, but won't find > anything. > OK, so there's just a minor performance cost of checking values that never hit. >> >> In general, how should someone tuning the compiler for this parameter >> select a value that isn't one of (-1, m_i_q_d+1)? > > From my experiments it seems there are 4 reasonable values for the parameter: > (-1) autopref turned off, (0) turned on in rank_for_schedule, (m_i_q_d+1) > turned on everywhere. If there is a static constructor generated for tune > tables and it is a problem to have it -- I can shrink acceptable values to > these 3 and call it a day. > You only mention 3 values: what was the fourth? It might be better then to define a set of values that represent each of these cases and only allow the tuning parameters to select one of those. The init code then uses that set to select how to set up the various parameters to meet those goals. So something like ARM_SCHED_AUTOPREF_OFF ARM_SCHED_AUTOPREF_RANK ARM_SCHED_AUTOPREF_FULL R. > -- > Maxim Kuvyrkov > www.linaro.org > > > >