On Fri, Apr 19, 2024 at 01:18:13PM +0200, Waldek Hebisch wrote:
> On Sat, Apr 13, 2024 at 12:54:22PM +0200, Dima Pasechnik wrote:
> >
> > But that's what ECL is doing, compiling to C and then calling a C compiler?
> > Would it be more useful to figure out why the resulting FriCAS runs
> > considerably slower than the one built using SBCL ?
<snip>
> ECL is much worse, in many cases it generates calls to general
> routines instead of directly doing the job. My benchmarking
> indicated that function calls in ECL are much slower than
> in sbcl.
A little uptate on this: recent ECL actually generates reasonable
code for function calls. I had a litte benchmark which measures
speed of calls and retried it. It turns out that main slowdown
is due to the way ECL implement its threading support. Benchamrk
is doing 10000000 and chain of 3 calls per iteration. Innermost
call increments a counter. Results are:
0.120 sec sbcl-2.2.9
0.239 sec ecl-23.9.9 with threads disabled
0.270 sec gcl-2.6.14
0.646 sec ecl-23.9.9 default build
So with threading disabled on this benchmark ECL is slightly
faster than GCL and gives about half of sbcl speed. Threading
slows down ECL to about 5.4 times slower than sbcl. More
precisely, with threading enabled each ECL compiled function
is doing call to 'pthread_getspecific'. Essentially only
difference between enabling threading and disabling it is
presence of this call.
Concerning the factor 2 between "good" ECL and sbcl:
- ECL generates extra stores to global variables (probably
as part of debugging support)
- ECL generates extra checks
- ECL vectors use double indirection, sbcl uses simple indirection,
so ECL generated code needs more leads than sbcl code
- sbcl uses tail call optimization, gcc compiles ECL code to
normal calls
At machine code level sbcl code does not look great, but it
is performig much less work than ECL code.
I also tried on a bit larger scale and compiled FriCAS using
single threaded ECL. Results are
ecl-23.9.9 default cl-23.9.9 no threads
Build: real 12m27.006s 9m34.639s
user 46m56.687s 38m55.354s
sys 3m47.180s 2m29.636s
Tests: real 2m33.963s 1m33.818s
user 14m23.708s 7m34.652s
sys 0m37.619s 0m20.587s
As you can see there is substantial reduction in CPU time, and
slightly smaller reduction in real time. Some reduction in
CPU time is likely because mutithreaded ECL is running someting
(probably garbage collection) in extra threads. Testsuite runs
in parallel, but there are some long running tests that increase
real time, relation between real time and CPU time is a bit
complicated.
--
Waldek Hebisch
--
You received this message because you are subscribed to the Google Groups
"FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/fricas-devel/ZiRYZU-p3wToGGEt%40fricas.org.