On Tue, 18 Sep 2012, Richard Sandiford wrote:
> > Have you had time to think about this some more? I am not sure I can
> > guess how you'd like me to fix this patch now without some more specific
> > review and/or suggestions about where the optimization should happen and
> > what cases it should be extended to detect in addition to the dsp
> > accumulator multiplies.
>
> The patch below is the one I've been testing. But I got sidetracked
> by looking into the possibility of removing the MD0_REG and MD1_REG
> classes, in order to get more sensible costs. I think that was needed
> for the madd-9.c test to pass.
Sorry to come up with this so late -- I have only now noticed this being
discussed.
> @@ -4105,39 +4105,55 @@ mips_subword (rtx op, bool high_p)
> return simplify_gen_subreg (word_mode, op, mode, byte);
> }
>
> -/* Return true if a 64-bit move from SRC to DEST should be split into two.
> */
> +/* Return true if SRC can be moved into DEST using MULT $0, $0. */
> +
> +static bool
> +mips_mult_move_p (rtx dest, rtx src)
> +{
> + return (src == const0_rtx
> + && REG_P (dest)
> + && GET_MODE_SIZE (GET_MODE (dest)) == 2 * UNITS_PER_WORD
> + && (ISA_HAS_DSP_MULT
> + ? ACC_REG_P (REGNO (dest))
> + : MD_REG_P (REGNO (dest))));
> +}
> +
> +/* Return true if a move from SRC to DEST should be split into two. */
Does the DSP ASE guarantee that a MULT $0, $0 is going not to be slower
than MTHI $0/MTLO $0? The latency of multiplication varies among
implementations, for example the original R3000 took 12 cycles (of course
the R3000 itself is not relevant for this change, but you see the
picture!). On the other hand in some (but not all!) processors
multiplication runs in parallel to the main pipeline so it is the
difference, if positive, between the number of cycles consumed by other
instructions up to the next HI/LO access instruction and the latency of
MULT run in the background that matters.
From the context I am assuming none of this matters for the 74K (and
presumably the 24KE/34K) and a MULT $0, $0 is indeed faster, but overall
isn't it something that should be decided based on instruction costs from
DFA schedulers? Is there anything that I've missed here? It doesn't
appear to me your (and neither the original) proposal takes instruction
cost calculation into consideration.
Maciej