"Maciej W. Rozycki" <[email protected]> writes:
> On Mon, 24 Sep 2012, Richard Sandiford wrote:
>
>> > From the context I am assuming none of this matters for the 74K (and
>> > presumably the 24KE/34K) and a MULT $0, $0 is indeed faster, but overall
>> > isn't it something that should be decided based on instruction costs from
>> > DFA schedulers? Is there anything that I've missed here? It doesn't
>> > appear to me your (and neither the original) proposal takes instruction
>> > cost calculation into consideration.
>>
>> In practice, we only move 0 into HI and LO for MADD- and MSUB-style
>> operations. We deliberately don't use HI and LO as scratch space.
>>
>> I think it's a reasonable default assumption that anything that supports
>> those instructions also has a fast path from MULT to MADD or MULT to MSUB.
>
> According to my sources the R4650 has a 4-cycle MULT latency (MAD is 3-4
> cycles on that processor). An MTHI/MTLO pair will take 2 cycles;
> obviously the resulting larger code may adversely affect cache performance
> in some scenarios.
That's not how the 4650 DFA models it though.
(define_insn_reservation "generic_hilo" 1
(eq_attr "type" "mfhi,mflo,mthi,mtlo")
"imuldiv*3")
(define_insn_reservation "r4650_imul" 4
(and (eq_attr "cpu" "r4650")
(eq_attr "type" "imul,imul3,imadd"))
"imuldiv*4")
So if we believed the DFA, MTLO + MTHI would occupy the muldiv unit for 6
rather than 4 cycles. Any attempt to use the DFA would still favour MULT.
Richard