I wrote in the past that we should make more use of modular
methods and that for this we need optimized low-level operations.
I have now made significant progress in this direction.
You can find a preview at:

http://www.math.uni.wroc.pl/~hebisch/fricas/mod1

To compile do:

)read compi.input

Once all files are compiled it should be enough to do:

)read load.input

The mop-tst.input and misol-tst.input files and comments in
matop.spad contain some example inputs.

I have tried to stay if possible in Spad, however I use a few
lisp macros (in ops32-64.lisp) as primitives.

u32vec.spad implements vectors of unsigned 32-bit integers, which are
crucial data structure for fast modular computation.  matop.spad
implements relatively fast modular LU decomposition and modular
matrix inverse (we could easily add other operations like
determinant).  vecrec2.spad is intended to be general package
for reconstruction of integer and rational results from modular
ones.  misol.spad shows sample application: solving system
of linear equations with integer coefficients over Q.  Earlier
versions of misol was posted on this list, but that one is
much faster due to faster modular operations.

To give you some idea about speed, on my machine modular
inverse of 100 by 100 matrix takes about 5.2 ms, LU decomposition
about 1.8 ms, misol can solve 100 by 100 system of equations with
small random coefficients in 75 ms.  Theoretical peak for
my machine is 0.208 ms for 100 by 100 matix inverse, so we are still
a lot (about 25 times) below theoretical peak.  For 1000 by 1000
matrix we need 3.06 sec, while theoretical peak is 0.208 sec so
this time we are 15 times below theoretical peak.  Still, this
is much faster than current generic routines.

Currently I get this speed only using 64-bit sbcl (and safety 0
setting).  On 32-bit machines or using other Lisp speed is much
lower, for example currently on 100 by 100 modular inverse ECL is
15 times slower.  However, it should be possible to enhance ECL
so that for matop it will generate faster code than sbcl.  Also,
for ECL penalty for using 32-bit machines should be much smaller
than for sbcl.

Why having 64-bit machine matter: we want efficient operations
for all intermediate quantities which apper in modular computations.
If we have to 32 bit numbers then product needs 64 bits.  When
computing dot products we want to avoid computing remainders of
intermediate sums, so we need to add several product without
overflow -- for dot product with 1000 terms this limits primes
to about 25 bits.  If we limited intermediate results to 32 bits,
then effectively this whould limit primes to about 10 bits,
whic is way too small. 

I have several other routines at various stage of developement,
in particular new routines for guessing and polynomial gcd.
My plan is to start adding modular routines to trunk in the
near future.  Let me add that currently the new code is
rather ugly and somewhat incoherent.  I want to clean it
somewhat and make more generic, but I also feel that it already
may give us substantial performance benefits.  Also, only
putting code to use will show us what needs to be
changed.

Comments welcome.

-- 
                              Waldek Hebisch
[email protected] 

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"FriCAS - computer algebra system" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/fricas-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to