On Sun, Oct 02, 2022 at 01:18:32PM +0800, Qian Yun wrote:
> I'm talking about multiplication in SparseUnivariatePolynomial,
> it is implemented in Domain PolynomialRing, the relevant part starts
> at line 431 of poly.spad. (aka the 'addm!' local function.)
>
> First, this function has complexity of O(n^3) for "truly sparse" input:
>
> By "truly sparse" I mean multiplication of polynomials with M and N
> terms results in (close to) MxN terms, instead of (M+N) terms.
>
> n := 400 ; f := reduce(+,[monomial(1,x)$SUP INT for x in 1..n]) ; g :=
> reduce(+,[monomial(1,x*n)$SUP INT for x in 1..n]) ;f*g;
>
> You can check in the above test that, n:=800 takes 8 times longer than
> n:=400.
>
> The reason is that for "truly sparse" cases, we iterate through the
> "res" list N times, each time it grows by N terms, resulting in total
> number of list iteration of N+2*N+3*N+...+N*N ~= 1/2*N^3.
>
> Now, compare with "dense" cases:
>
> n := 5000 ; f := reduce(+,[monomial(1,x)$SUP INT for x in 1..n]) ; g :=
> reduce(+,[monomial(1,x)$SUP INT for x in 1..n]) ;f*g;
>
> The time complexity is O(n^2). (Because "res" is growing by 1 each time
> in this case.)
>
> ====
>
> Now, what about realistic scenarios, is it dense or sparse?
Multivariate. In case of 'integ.input' there is good chance
that most calls to SUP multiplication came from multivariate
multiplications.
> I modify the definition of "*" to print the number of terms in p, q and
> (p*q), and run the tests in src/input/integ.input as the workload.
>
> The result is 2234170 multiplication calls in total.
>
> The biggest one is polynomials with 67 and 68 terms result in 134 terms.
>
> So, at least for integration, there is no big SUP multiplication, and
> the polynomials are dense in this case.
>
> In the following table, the second column is the terms of multiplication
> result and first column is its frequency:
>
> 37361 10
> 35673 9
> 41543 8
> 40835 7
> 56356 6
> 61486 5
> 164553 4
> 1072030 3
> 183491 2
> 206388 1
>
> So out of the 2 million multiplications, over half are multiplication
> of polynomials with fewer than 3 terms. 85% of results are less than 10
> terms.
>
> ====
>
> So first conclusion is to optimize for small inputs. There's not much
> room for it, I think.
>
> For bigger inputs, I think current implementation is bad both ways:
>
> a) For sparse cases, simply chain the MxN terms together, sort and
> dedupe is O(N^2*log(N)) better than current O(N^3).
There are some extra aspects:
- memory allocation is much more expensive than coefficient operations
on small integer or list traversals, so we perfer "in place" versions
for moderate lengths.
- when coefficient operations are expensive (top level of multivatiate
multiplication) and big N^3 list operations is not that bad, so
main case to optimize is when coefficients are simple and small;
- some sparse cases are "structural", say arithmetic progression of
exponents. This is more frequent in multivariate case but
sometimes happens in univariate case too. With respect to number
of terms such multiplication is more similar to dense one.
- in really large dense cases we may be more limited by memory
than by runtime, so we want to de-dupe relatively early.
- for sparse case it probably makes sense from the start to
optimize multivariate case. There are several approaches in
this direction: Giac uses hash table to collect similar terms.
Other used various heaps and buckets that simultaneously
de-dupe and sort.
There are publications giving some details of fast multiplication
(and also division and gcd) for large sparse case by Monagan and Pearce
(their code is included in Maple) and Gastineau and Laskar (they
develop Trip system).
> b) For dense cases, using actual dense representation like array
> should be a great improvement over current implementation:
> we only need a (slightly larger than) (M+N) length array to store
> the temp result.
Main possible gain here is in modular case, for this we have
POLYVEC (but, as I wrote more like this would be useful).
> ====
>
> What's your comment? If there's no disagreement for my analysis,
> I can start writing actual patch to address the issues mentioned above.
> And we can discuss that patch and performance numbers next time.
--
Waldek Hebisch
--
You received this message because you are subscribed to the Google Groups
"FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/fricas-devel/20221004144132.GA22916%40fricas.math.uni.wroc.pl.