On Thu, May 30, 2013 at 4:25 PM, Yuri Rumyantsev <[email protected]> wrote:
> Hi All
>
> Second patch enables several Silvermont uarch features which improve
> performance of the new processor (based on experiments on real SLM
> hardware):
> 1. If using a 2-source or 3-source LEA for non-destructive destination
> purposes, or due to wanting ability to use SCALE, the use of LEA is
> preferable.
> 2. Transformation of FP conversion for memory operands into conversion
> from register.
> 3. Couple of improvements for post-reload scheduling:
> - increase latency of integer loads and load/store with exact dependence;
> - simple re-ordering of the top of ready list - if 2 instructions
> at the top of the list have the same priority we consider instruction
> which producer(s) were scheduled earlier as the best candidate.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?
>
> 2013-05-30 Yuri Rumyantsev <[email protected]>
> Igor Zamyatin <[email protected]>
>
> Silvermont (SLM) architecture performance tuning.
> * config/i386/i386.h (enum ix86_tune_indices): Add
> X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS.
> (TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS): New define.
>
> * config/i386/i386.c (initial_ix86_tune_features)
> <X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS>: Initialize.
> (ix86_lea_outperforms): Handle Silvermont tuning.
> (ix86_avoid_lea_for_add): Add new argument to ix86_lea_outperforms
> call.
> (ix86_use_lea_for_mov): Likewise.
> (ix86_avoid_lea_for_addr): Likewise.
> (ix86_lea_for_add_ok): Likewise.
> (exact_dependency_1): New function.
> (exact_store_load_dependency): Likewise.
> (ix86_adjust_cost): Handle Silvermont tuning.
> (do_reoder_for_imul): Likewise.
> (swap_top_of_ready_list): New function.
> (ix86_sched_reorder): Changed to handle Silvermont tuning.
>
> * config/i386/i386.md (peepholes that split memory operand in fp
> converts): New
@@ -24625,9 +24730,9 @@ ix86_sched_reorder(FILE *dump, int
sched_verbose, rtx *ready, int *pn_ready,
- con = DEP_CON (dep);
- if (!NONDEBUG_INSN_P (con))
- continue;
+ con = DEP_CON (dep);
+ if (!NONDEBUG_INSN_P (con))
+ continue;
There are some unnecessary whitespace changes (tabs->spaces) in a
couple of places throughout the patch, such as in the above lines.
+(define_peephole2
+ [(set (match_operand:DF 0 "register_operand")
+ (float_extend:DF
+ (match_operand:SF 1 "memory_operand")))]
+ "TARGET_SPLIT_MEM_OPND_FOR_FP_CONVERTS
+ && optimize_insn_for_speed_p ()
+ && SSE_REG_P (operands[0])"
+ [(set (match_dup 2) (match_dup 1))
+ (set (match_dup 0) (float_extend:DF (match_dup 2)))]
+{
+ operands[2] = gen_rtx_REG (SFmode, REGNO (operands[0]));
+})
You should use
(match_scratch:SF 2 "x")
at the top of the peephole2 pattern, and you will get a free scratch
register (assuming that it is not necessary to use the same register
for input and output operand of the float_extend insn).
Otherwise, the patch looks OK to me.
Uros.