On Jul 23, 10:59 pm, Jason Moxham <[email protected]> wrote: > Hi > > I've been doing some preliminary experimentation for mul_basecase on Core i7 > nehalem ,and of course K8 and core2. > > For the AMD chips , we are currently bound by the macro-op retirement rate , > and I didn't think we could improve it . > Currently for addmul1 and mul1 we have > > mov 0,r3 > mov (src),ax > mul cx > add r1,-1(dst) // this is a mov for mul1 > adc ax,r2 > adc dx,r3 > > which is 7 op's for 1limb which leads to 2.333c/l+loopcontrol > and for addmul2 we have > > mov 0,r4 > mov (src),ax > mul cx > add r1,-1(dst) > adc ax,r2 > adc dx,r3 > mov 1(src),ax > mul bx > add ax,r2 > adc dx,r3 > adc 0,r4 > > which is 13 op's for 2 limbs which leads to 2.166c/l+loop control > > For addmul1 and mul1 we can get a perfect schedule and with 4-way unroll we > get 2.5c/l , this is optimal for K8 as add reg,(dst) has a max thruput of > 2.5c , on the K10 we dont have this restriction and with a larger unroll and > perfect scheduling we can improve things. I've not tried this approach as you > would have to go to 7-way unroll to get anything better than 2.5c/l > For mul1 it is possible to reduce the instruction count down to 2c/l+epsilon > like this > > mov (src),ax > mul cx > mov ax,r8 > mov dx,1(dst) > > mov 1(src),ax > mul cx > mov ax,r9 > mov dx,2(dst) > > mov 2(src),ax > mul cx > mov ax,r10 > mov dx,3(dst) > > mov 3(src),ax > mul cx > #mov ax,r11 > mov dx,4(dst) > > add r12,r12 > > adc r8,(dst) > adc r9,1(dst) > adc r10,2(dst) > adc ax,3(dst) > > sbb r12,r12 > > add 4,count > jne loop > > which is 27 ops for 4 limbs = 2.25c/l for mul_1 on K10 , but the best I could > get is 5c/l .Its hardly surprising given how many "rules" the above breaks.
The New AMD Phenom II (we will call this chip K10_2) runs this code at 2.5c/l , this suggests they have not changed the pick hardware , but have improved the store forwarding (much more like Intel's). Our mpn_addadd,addsub,subadd all now run at the predicted optimal speed on the new K10_2 , same for addlsh1 and some others. Jason --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---
