On 19/01/15 18:14, Maxim Kuvyrkov wrote:
> On Jan 19, 2015, at 6:05 PM, Richard Earnshaw <rearn...@arm.com> wrote:
> 
>> On 16/01/15 15:06, Maxim Kuvyrkov wrote:
>>> @@ -1874,7 +1889,8 @@ const struct tune_params arm_cortex_a15_tune =
>>>   true, true,                                   /* Prefer 32-bit encodings. 
>>>  */
>>>   true,                                             /* Prefer Neon for 
>>> stringops.  */
>>>   8,                                                /* Maximum insns to 
>>> inline memset.  */
>>> -  ARM_FUSE_NOTHING                         /* Fuseable pairs of 
>>> instructions.  */
>>> +  ARM_FUSE_NOTHING,                                /* Fuseable pairs of 
>>> instructions.  */
>>> +  max_insn_queue_index + 1                 /* Sched L2 autopref depth.  */
>>> };
>>
>>
>> Hmm, two issues here:
>> 1) This requires a static constructor for the tuning table entry (since
>> the value of max_insn_queue_index has to be looked up at run time.
> 
> Are you sure?  I didn't check the object files, but, since 
> max_insn_queue_index is a "const int", I would expect a relocation that would 
> be resolved at link time, not a constructor.
> 

Yes, I'm sure.  Relocations can only resolve addresses of objects, not
their contents.  LTO might eliminate the need for the reloc, but
otherwise the compiler will never see the definition and will need to
create a static constructor.

> Is it a problem to have a static constructor for the tables?

Needing constructors means that the compiler can't put the object into
read-only sections of the image.  It's not a huge problem, but if there
are ways by which they can be avoided, that's likely to be preferable;
there's a small run-time overhead to running them.

> 
>>
>> 2) Doesn't this mean that the depth of searching will depend on
>> properties of the automata rather than some machine specific values (so
>> that potentially adding or removing unrelated scheduler rules could
>> change the behaviour of the compiler)?
> 
> No.  The extra queue entries that will appear from extending an unrelated 
> automaton will be empty, so the search will check them, but won't find 
> anything.
> 

OK, so there's just a minor performance cost of checking values that
never hit.


>>
>> In general, how should someone tuning the compiler for this parameter
>> select a value that isn't one of (-1, m_i_q_d+1)?
> 
> From my experiments it seems there are 4 reasonable values for the parameter: 
> (-1) autopref turned off, (0) turned on in rank_for_schedule, (m_i_q_d+1) 
> turned on everywhere.  If there is a static constructor generated for tune 
> tables and it is a problem to have it -- I can shrink acceptable values to 
> these 3 and call it a day.
> 

You only mention 3 values: what was the fourth?  It might be better then
to define a set of values that represent each of these cases and only
allow the tuning parameters to select one of those.  The init code then
uses that set to select how to set up the various parameters to meet
those goals.

So something like

ARM_SCHED_AUTOPREF_OFF
ARM_SCHED_AUTOPREF_RANK
ARM_SCHED_AUTOPREF_FULL


R.
> --
> Maxim Kuvyrkov
> www.linaro.org
> 
> 
> 
> 


Reply via email to