The following patch implements general spilling one class pseudos
into another class hard registers *instead of memory* in LRA.

  Currently, the patch implements spilling of general reg pseudos into
SSE regs for Intel Core architecture as it is recommended by Intel
optimization guide.  Such optimization improves performance and size
of the generated code with LRA.  The size is improved because movd
insn (moving general regs to/from SSE regs) has smaller size that x86
load/store from stack with address offset bigger than 128).  There is
also a steady improvement in code performance with usage of such
optimization for Intel core processors.

  The optimization worsens code performance for AMD processors (Phenom
and Bulldozer) because usage of movd insn is less profitable than
st/ld and it is obvious why X86_TUNE_INTER_UNIT_MOVES is off for such

  The optimization worsens code performance for Intel Atom although
one could think the opposite as X86_TUNE_INTER_UNIT_MOVES is on for
this processor.  Interesting enough that switching
X86_TUNE_INTER_UNIT_MOVES off for Atom practically does not change the
code performance whithout the optimization.

  The optimization might be useful for some other processors which
have direct move insns for the two considered classes and when IRA for
some reasons did not use the class union.  At least I see
that we could try this for ARM (spilling general regs into VF regs)
and for extended powerpc architecture (spilling general regs into fp
regs).  What is only necessary is just to define two macros.  I am
going to do it for ARM and see is this optimization beneficial for
OMAP4.  Although I think it is not as fp units with VF regs in ARM
implementations I know are too separate from integer units.

The patch was successfully bootstrapped on x86/x86-64 with additional
options -mtune=corei7 -march=corei7.

Committed as rev. 185884.

2012-03-27  Vladimir Makarov <>

    * common.opt (flra-reg-spill): New option.


    * target.def (spill_class, spill_class_mode): New hooks.

    * target.h: Include tm.h.

    * lra-int.h (lra_reg_spill_p): New external.

    * lra.c (lra_reg_spill_p): New global var.
    (setup_reg_spill_flag): New function.
    (lra): Call setup_reg_spill_flag.  Use lra_reg_spill_p as an
    argument for lra_create_live_ranges before spill sub-pass.

    * lra-spills.c: Include ira.h.
    (spill_hard_reg): New array.
    (struct slot): Add new memebr hard_regno.
    (assign_slot): Rename to assign_mem_slot.
    (assign_spill_hard_regs): New function.
    (add_pseudo_to_slot): Ditto.
    (assign_stack_slot_num_and_sort_pseudos): Rewrite using
    (remove_pseudos): Use spill_hard_reg.
    (lra_spill): Allocate, initialize, and free spill_hard_reg.
    Sort pseudo_regnos and call assign_spill_hard_regs.

    * lra-assign.c (assign_hard_regno): Use the biggest mode instead
    of the pseudo mode.

    * (lra-spills.c): Add dependence on ira.h.

    * config/i386/i386.h (enum ix86_tune_indices): Add

    * config/i386/i386.c (initial_ix86_tune_features): Add entry for
    (ix86_spill_class): New function.
    (ix86_spill_class_mode): Ditto.

Reply via email to