I'm talking about multiplication in SparseUnivariatePolynomial,
it is implemented in Domain PolynomialRing, the relevant part starts
at line 431 of poly.spad. (aka the 'addm!' local function.)
First, this function has complexity of O(n^3) for "truly sparse" input:
By "truly sparse" I mean multiplication of polynomials with M and N
terms results in (close to) MxN terms, instead of (M+N) terms.
n := 400 ; f := reduce(+,[monomial(1,x)$SUP INT for x in 1..n]) ; g :=
reduce(+,[monomial(1,x*n)$SUP INT for x in 1..n]) ;f*g;
You can check in the above test that, n:=800 takes 8 times longer than
n:=400.
The reason is that for "truly sparse" cases, we iterate through the
"res" list N times, each time it grows by N terms, resulting in total
number of list iteration of N+2*N+3*N+...+N*N ~= 1/2*N^3.
Now, compare with "dense" cases:
n := 5000 ; f := reduce(+,[monomial(1,x)$SUP INT for x in 1..n]) ; g :=
reduce(+,[monomial(1,x)$SUP INT for x in 1..n]) ;f*g;
The time complexity is O(n^2). (Because "res" is growing by 1 each time
in this case.)
====
Now, what about realistic scenarios, is it dense or sparse?
I modify the definition of "*" to print the number of terms in p, q and
(p*q), and run the tests in src/input/integ.input as the workload.
The result is 2234170 multiplication calls in total.
The biggest one is polynomials with 67 and 68 terms result in 134 terms.
So, at least for integration, there is no big SUP multiplication, and
the polynomials are dense in this case.
In the following table, the second column is the terms of multiplication
result and first column is its frequency:
37361 10
35673 9
41543 8
40835 7
56356 6
61486 5
164553 4
1072030 3
183491 2
206388 1
So out of the 2 million multiplications, over half are multiplication
of polynomials with fewer than 3 terms. 85% of results are less than 10
terms.
====
So first conclusion is to optimize for small inputs. There's not much
room for it, I think.
For bigger inputs, I think current implementation is bad both ways:
a) For sparse cases, simply chain the MxN terms together, sort and
dedupe is O(N^2*log(N)) better than current O(N^3).
b) For dense cases, using actual dense representation like array
should be a great improvement over current implementation:
we only need a (slightly larger than) (M+N) length array to store
the temp result.
====
What's your comment? If there's no disagreement for my analysis,
I can start writing actual patch to address the issues mentioned above.
And we can discuss that patch and performance numbers next time.
- Qian
--
You received this message because you are subscribed to the Google Groups "FriCAS -
computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/fricas-devel/25058e46-82bb-9364-0e6c-2e0edcee21dc%40gmail.com.