On Fri, Apr 19, 2024 at 01:18:13PM +0200, Waldek Hebisch wrote:
> On Sat, Apr 13, 2024 at 12:54:22PM +0200, Dima Pasechnik wrote:
> > 
> > But that's what ECL is doing, compiling to C and then calling a C compiler? 
> > Would it be more useful to figure out why the resulting FriCAS runs 
> > considerably slower than the one built using SBCL ?
<snip> 
> ECL is much worse, in many cases it generates calls to general
> routines instead of directly doing the job.  My benchmarking
> indicated that function calls in ECL are much slower than
> in sbcl.

A little uptate on this: recent ECL actually generates reasonable
code for function calls.  I had a litte benchmark which measures
speed of calls and retried it.  It turns out that main slowdown
is due to the way ECL implement its threading support.  Benchamrk
is doing 10000000 and chain of 3 calls per iteration.  Innermost
call increments a counter.  Results are:

 0.120 sec sbcl-2.2.9
 0.239 sec ecl-23.9.9 with threads disabled
 0.270 sec gcl-2.6.14
 0.646 sec ecl-23.9.9 default build

So with threading disabled on this benchmark ECL is slightly
faster than GCL and gives about half of sbcl speed.  Threading
slows down ECL to about 5.4 times slower than sbcl.  More
precisely, with threading enabled each ECL compiled function
is doing call to 'pthread_getspecific'.  Essentially only
difference between enabling threading and disabling it is
presence of this call.

Concerning the factor 2 between "good" ECL and sbcl:
- ECL generates extra stores to global variables (probably
  as part of debugging support)
- ECL generates extra checks
- ECL vectors use double indirection, sbcl uses simple indirection,
  so ECL generated code needs more leads than sbcl code
- sbcl uses tail call optimization, gcc compiles ECL code to
  normal calls

At machine code level sbcl code does not look great, but it
is performig much less work than ECL code.

I also tried on a bit larger scale and compiled FriCAS using
single threaded ECL.  Results are
                 ecl-23.9.9 default       cl-23.9.9 no threads
Build:    real    12m27.006s                9m34.639s
          user    46m56.687s               38m55.354s
          sys     3m47.180s                 2m29.636s
Tests:    real    2m33.963s                 1m33.818s
          user   14m23.708s                 7m34.652s
          sys     0m37.619s                 0m20.587s

As you can see there is substantial reduction in CPU time, and
slightly smaller reduction in real time.  Some reduction in
CPU time is likely because mutithreaded ECL is running someting
(probably garbage collection) in extra threads.  Testsuite runs
in parallel, but there are some long running tests that increase
real time, relation between real time and CPU time is a bit
complicated.

-- 
                              Waldek Hebisch

-- 
You received this message because you are subscribed to the Google Groups 
"FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/fricas-devel/ZiRYZU-p3wToGGEt%40fricas.org.

Reply via email to