Re: [fricas-devel] repeat map! definition in List makes it much faster

Waldek Hebisch Tue, 13 Sep 2016 17:17:06 -0700

oldk1331 wrote:
> 
> I saw this change in openaxiom that add map!
> in List, I benchmark it and find out it makes a huge
> difference:
> 
> x := new(10^8,1)$List INT;
> )time on
> map!(y+->y,x); --1.76s before; 0.79s after
> map!(y+->y+1,x); -- 2.44s before; 1.23s after
> 
> 
> 1. I wonder which functions else can gain performance
> simply by repeat definition?


A lot.  Basically computation advances when some basic
operation is performed.  Everyting else is overhead.
In Spad "normal" case is to do function call.  And
Spad function call is compiled to indirect function
call at lower level.  Which means that lower level
compiler has almost no possiblity to optimize calls
and optimization is limited to single function.
Of course at lowest level Spad compiler invokes
primitive operations.  For example operations
in List are implemented via Lisp macros.  But
without any extra optimizations any user of List
operations would have to access them via Spad
function calls.  To get better efficiency, when
it is known that given operation comes from say
List, and Spad compiler thinks it is simple enough
(basically a single non-nested Lisp expression)
then Spad compiler will replace call by inline
version.  However, we want "generic" code, that
is code which works for various domains.  For
example category default should work for all
domains of given category.  So when compiling
category defaults normally Spad compiler does
not know for which domain the code will be
used which typically means it does not know
sources of various operations.

In few cases where functions from category defaults
affect performance in significat way we duplicate
definitions and add conditions before.  You can
look at MatrixCategory.  We have macro 'Ops'
which defines several operations.  And then
we have:

     -- Specialize to get more efficient code
     if % is Matrix(R) then
         Ops
     else
         Ops

In the 'Matrix(R)' branch it is known that we compile
for 'Matrix(R)' so Spad compiler can inline operations
like accessors and setters from 'Matrix(R)'.  The
other branch produces trurly generic code.

My policy concerning optimizations was mostly reactive.
I look at profile data and try to eliminate expensive
part.  Sometimes this means algorithmic improvement
(like replacing expensive operation by a cheaper one).
Sometimes I noticed that some operation was expensive
because of generic definition -- then I looked at
opportunities to improve performance via inlining.

For example, when working on guessing packge I wanted
it to handle big problems, which meant it had to be
fast enough.  Profile data showed that in otherwise
O(n^3) routine two dimensional array accesses (routine
performed O(n^2) of them) which should be insignificant
showed up as measurable portion (IIRC 15%) of runtime.  At
that time two dimensional array access worked via function
calls.  So I modified code so that they could be inlined
and after that they were insignifcant in this routine.

Comparison in direct product showed up in profile
for Groebner bases: "distributed" polynomials use
direct product as part of their representation
and comparisons take notrivial part of time to
do polynomial operations.

> 2. The deeper question, can compiler do this optimization
> automatically?

Well, the main point is: is the optimization worthwile?
Spad compiler contains currently broken code which would
generate specialized versions of routines given appropriate
list of conditions.  Weakness of this code is:
- one has to manually specify which specializations to
  produce
- it specializes all routines in given domian/package
  while normally only some routines are worth specializing
- it specializes each routine separately, while taking them
  as a block could uncover more specializations

In principle we could add code which given a bunch of tests
would produce a list of routines which after specialization
give measurable improvement in execution time.  Namely
sbcl contains profiler which can tell you how much time
is spent in given routine and in routines called from
it.  So we could generate list of routines calling
potentially inlinable operations and try to specialize
them.  Finding useful specialization is not obvious, but
in principle for given routine Spad compiler could
produce list of conditions allowing specialization.

BTW.  Aldor compiler can do much more optimizations than
Spad compiler.  But IIUC it can not automatically decide
which routines are worth specializing.

BTW2: In case of 'map' or 'map!' with explicit functional
argument best optimizations is to inline definition of map
into current code, then inline functional argument into
it.  With good low level optimizer this can give 10-100
time speedup, much more than speeding up 'map' can
give.

-- 
                              Waldek Hebisch

-- 
You received this message because you are subscribed to the Google Groups 
"FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/fricas-devel.
For more options, visit https://groups.google.com/d/optout.

Re: [fricas-devel] repeat map! definition in List makes it much faster

Reply via email to