> You assume OOO works perfectly. > > mov $0,%r11 > mul %rcx > add %rax,%r10 > mov 24(%rsi,%rbx,8),%rax > adc %rdx,%r11 > mov %r10,16(%rdi,%rbx,8) > mul %rcx > here mov $0,%r8 > add %rax,%r11 > mov 32(%rsi,%rbx,8),%rax > adc %rdx,%r8 > mov %r11,24(%rdi,%rbx,8) > > moving the line at "here" up one before the mul , slows things down from 2.78 > to 3.03 c/l , whereas if OOO was perfect , it should not have any effect. > This may be due to a cpu scheduler bug , or perhaps the shedulers not > perfect , mul being long latency , two macro ops , two pipes , only pipe 0_1 > etc > If its a bug then perhaps K10 is better?
I've seen similar wackiness with the core 2 out-of-order engine. It's strange enough that sometimes sticking in a nop actually saves a cycle! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---
