I had a go at implementing the suggestion to split the inputs to toom3
along non-limb boundaries. Basically if n = 1 mod 3 then I split along
a half limb boundary.

Before we were splitting into k+1, k+1, k-1 limbs where k = n / 3. As
there are some additions that take place, one would usually end up
doing multiplications of k + 2 limbs. By splitting along half limb
boundaries, one gets k+1/2, k+1/2, k limbs and even after additions
one only ends up multiplying k+1 limbs.

But the whole thing didn't work. It slowed it down by 5% on average.
The reason is twofold. Firstly an extra copy of the first k+1/2 limbs
of each of the operands needs to be made. Secondly one is adding, in a
number of places, operands which are half a limb shifted, to operands
which are not. I am assuming that this misaligned data causes a big
penalty in cycles.

Bill.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to