Hmm, so the only reason MPIR can be beating me for small
multiplications is general coding efficiency. My butterfly strategy
probably introduces overheads when the coefficients are small, i.e.
only a few limbs.

The overheads could be lowered by having an iterative FFT for short
lengths. But we should worry about that later. It will complicate the
code too much and the existing code needs consolidation first. There's
a lot of code duplication currently.

Bill.

2010/1/3 Bill Hart <[email protected]>:
> 2010/1/3 David Harvey <[email protected]>:
>>
>>
>> On Jan 3, 9:03 am, Bill Hart <[email protected]> wrote:
>>
>>> I figured I probably did misunderstand. What I think they are trying
>>> to get at is that one can use coefficients modulo 2^N+1 except at
>>> certain levels because the roots of unity are actually the same. But
>>> then the discussion becomes one of combining a Fermat and Mersenne
>>> transform, which is the basic strategy that the implementation uses. I
>>> got confused. I still am.
>>
>> I think they are saying that to multiply two integers whose product
>> has N bits, choose a and b (as in Lemma 1 of their paper) such that a
>> + b ~ N and then multiply modulo 2^a - 1 (use Mersenne transform) and
>> modulo 2^b + 1 (use Fermat transform). The smoothness comes from the
>> freedom to choose a and b.
>>
>
> OK, I think I see. Their big N corresponds to an FFT for the original
> integers, which are multiplied mod 2^N+1 in the case of a Fermat
> transform or 2^N-1 in the case of a Mersenne transform. I think I
> would call these negacyclic and cyclic convolutions respectively.
>
> But then a Mersenne transform is nothing but an ordinary FFT, no?
>
> The idea of combining with a negacyclic FFT seems to be one of the
> strategies I discussed and discarded.
>
> I still don't understand what they mean by "power-of-2 roots of unity
> are needed only at the "lower level"". I understand that by the "lower
> level" they seem to mean the pointwise mults. But what is a power of 2
> root of unity? Aren't they all powers of 2, except when using the
> sqrt2 trick.
>
> Maybe I don't understand the Schonhage-Strassen algorithm correctly.
> Perhaps they not only worked with coefficients mod 2^B+1 but also did
> the convolution mod x^m+1. They were worried about asymptotic
> complexity, not speed, so they wanted their algorithm to recurse so
> that the pointwise mults could be done using the same algorithm. Gosh
> that is exactly what the authors of this paper say!
>
> So my misunderstanding came from the fact that I assumed SS advocated
> a Mersenne transform, i.e. mod x^m-1, not a Fermat transform, mod
> x^m+1.
>
> Well, just to be clear. I worked mod x^m-1 because it is more
> efficient, and in a ring mod p where p = 2^B+1, for which one must
> perform a negacyclic convolution if one wishes to recurse, which
> involves twiddling to get an ordinary cyclic convolution.
>
> Anyhow, this is all a non-strategy once you have truncation. All that
> is required is to implement a single transform which works with
> coefficients mod 2^B-1, which I was formerly calling a Mersenne ring,
> when the large integers being multiplied are "small". Then you don't
> want to recurse for doing the pointwise mults and so the pointwise
> mults can be done mod 2^B/2-1 and 2^B/2+1 using our mpn_mulmod_2expp1
> and mpn_mulmod_2expm1, which can do whatever is most efficient.
>
> Bill.
>

--

You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.


Reply via email to