> You assume OOO works perfectly.
>
>     mov $0,%r11
>        mul %rcx
>        add %rax,%r10
>        mov 24(%rsi,%rbx,8),%rax
>        adc %rdx,%r11
>        mov %r10,16(%rdi,%rbx,8)
>        mul %rcx
> here        mov $0,%r8
>        add %rax,%r11
>        mov 32(%rsi,%rbx,8),%rax
>        adc %rdx,%r8
>        mov %r11,24(%rdi,%rbx,8)
>
> moving the line at "here" up one before the mul , slows things down from 2.78
> to 3.03 c/l , whereas if OOO was perfect , it should not have any effect.
> This may be due to a cpu scheduler bug , or perhaps the shedulers not
> perfect , mul being long latency , two macro ops , two pipes , only pipe 0_1
> etc
> If its a bug then perhaps K10 is better?

I've seen similar wackiness with the core 2 out-of-order engine.  It's
strange enough that sometimes sticking in a nop actually saves a
cycle!

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to