On Tue, 18 Sep 2012, Richard Sandiford wrote: > > Have you had time to think about this some more? I am not sure I can > > guess how you'd like me to fix this patch now without some more specific > > review and/or suggestions about where the optimization should happen and > > what cases it should be extended to detect in addition to the dsp > > accumulator multiplies. > > The patch below is the one I've been testing. But I got sidetracked > by looking into the possibility of removing the MD0_REG and MD1_REG > classes, in order to get more sensible costs. I think that was needed > for the madd-9.c test to pass.
Sorry to come up with this so late -- I have only now noticed this being discussed. > @@ -4105,39 +4105,55 @@ mips_subword (rtx op, bool high_p) > return simplify_gen_subreg (word_mode, op, mode, byte); > } > > -/* Return true if a 64-bit move from SRC to DEST should be split into two. > */ > +/* Return true if SRC can be moved into DEST using MULT $0, $0. */ > + > +static bool > +mips_mult_move_p (rtx dest, rtx src) > +{ > + return (src == const0_rtx > + && REG_P (dest) > + && GET_MODE_SIZE (GET_MODE (dest)) == 2 * UNITS_PER_WORD > + && (ISA_HAS_DSP_MULT > + ? ACC_REG_P (REGNO (dest)) > + : MD_REG_P (REGNO (dest)))); > +} > + > +/* Return true if a move from SRC to DEST should be split into two. */ Does the DSP ASE guarantee that a MULT $0, $0 is going not to be slower than MTHI $0/MTLO $0? The latency of multiplication varies among implementations, for example the original R3000 took 12 cycles (of course the R3000 itself is not relevant for this change, but you see the picture!). On the other hand in some (but not all!) processors multiplication runs in parallel to the main pipeline so it is the difference, if positive, between the number of cycles consumed by other instructions up to the next HI/LO access instruction and the latency of MULT run in the background that matters. From the context I am assuming none of this matters for the 74K (and presumably the 24KE/34K) and a MULT $0, $0 is indeed faster, but overall isn't it something that should be decided based on instruction costs from DFA schedulers? Is there anything that I've missed here? It doesn't appear to me your (and neither the original) proposal takes instruction cost calculation into consideration. Maciej