On 10/01/2013 01:55 AM, Wei Mi wrote: >> Probably the best place to add a code for this is in >> lra-constraints.c::simplify_operand_subreg by permitting subreg reload >> for paradoxical subregs whose hard regs are not fully in allocno class >> of the inner pseudo. >> >> It needs a good testing (i'd check that the generated code is not >> changed on variety benchmarks to see that the change has no impact on >> the most programs performance) and you need to add a good comment >> describing why this change is needed. >> > Vlad, thanks! I make another patch here by following your guidance. > Please check whether it is ok. Boostrap and regression ok. I am also > verifying its performance effect on google applications (But most of > them are 64 bits, so I cannot verify its performance effect on 32 bits > apps). > > The idea of the patch is here: > > For the following two types of paradoxical subreg, we insert reload in > simplify_operand_subreg: > 1. If the op_type is OP_IN, and the hardreg could not be paired with > other hardreg to contain the outermode operand, for example R15 in > x86-64 (checked by in_hard_reg_set_p), we need to insert a reload. If > the hardreg allocated in IRA is R12, we don't need to insert reload > here because upper half of rvalue paradoxical subreg is undefined so > it is ok for R13 to contain undefined data. > > 2. If the op_type is OP_OUT or OP_INOUT. > (It is possible that we don't need to insert reload for this case > too, because the upper half of lvalue paradoxical subreg is useless. > If the assignment to upper half of subreg register will not be > generated by rtl split4 stage, we don't need to insert reload here. > But I havn't got a testcase to verify it so I keep it) > > Here is a paradoxical subreg example showing how the reload is generated: > > (insn 5 4 7 2 (set (reg:TI 106 [ __comp ]) > (subreg:TI (reg:DI 107 [ __comp ]) 0)) {*movti_internal_rex64} > > In IRA, reg107 is allocated to a DImode hardreg. If reg107 is assigned > to hardreg R15, compiler cannot find another hardreg to pair with R15 > to contain TImode data. So we insert a TImode reload pseudo reg180 for > it. > > After reload is inserted: > > (insn 283 0 0 (set (subreg:DI (reg:TI 180 [orig:107 __comp ] [107]) 0) > (reg:DI 107 [ __comp ])) -1 > (insn 5 4 7 2 (set (reg:TI 106 [ __comp ]) > (subreg:TI (reg:TI 180 [orig:107 __comp ] [107]) 0)) > {*movti_internal_rex64} > > Two reload hard registers will be allocated to reg180 to save TImode > operand in LRA_assign. > Wei Mi, thanks very much for working on this. The patch is ok for me except for one change (please see below). > > 2013-09-30 Wei Mi <w...@google.com> > > * lra-constraints.c (insert_move_for_subreg): New function. > (simplify_operand_subreg): Add reload for paradoxical subreg. > > > @@ -1215,13 +1242,9 @@ simplify_operand_subreg (int nop, enum m > && (hard_regno_nregs[hard_regno][GET_MODE (reg)] > >= hard_regno_nregs[hard_regno][mode]) > && simplify_subreg_regno (hard_regno, GET_MODE (reg), > - SUBREG_BYTE (operand), mode) < 0 > - /* Don't reload subreg for matching reload. It is actually > - valid subreg in LRA. */ > - && ! LRA_SUBREG_P (operand)) > + SUBREG_BYTE (operand), mode) < 0) You removed conditition with LRA_SUBREG for non-paradoxical subreg generated for matched operands. I think that is important condition and the comment says why. There are some 32-bit insns constraints requiring different modes (int and fp ones) for matching operands in FP regs. The condition prevents LRA cycling as such subreg can look invalid (in simplify_subreg_regno) but it is used internally in LRA for matching constraint expression and should be not reloaded.
With this change the patch is ok for me. > || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg)) > { > - enum op_type type = curr_static_id->operand[nop].type; > /* The class will be defined later in curr_insn_transform. */ > enum reg_class rclass > = (enum reg_class) targetm.preferred_reload_class (reg, ALL_REGS); >