On Sat, Apr 13, 2024 at 12:54:22PM +0200, Dima Pasechnik wrote:
>
> But that's what ECL is doing, compiling to C and then calling a C compiler?
> Would it be more useful to figure out why the resulting FriCAS runs
> considerably slower than the one built using SBCL ?
On social level it is clear: ECL has small developement resources
and speed is not a top priority. sbcl descends from CMUCL which
was designed for performance and a lot of effort went into
improving performance.
On more technical level some things are known for long time.
First, sbcl has well working profiler which is very helpful
in identifying bottlenecks. ECL folks basically say: use
C tools. But IME C profilers work well for identifying
low-level troubles. But in machine generated code slowness
is frequently spread out over large area and C tools do not
work well. Second, there is semantic mismatch between high-level
untyped operations offerd by default in Lisp and typed low-level
code needed for high performance. Spad code is typed and Spad
complier makes some effort to preserve types in its translation
to Lisp. sbcl makes quite good job extracting low level types
from Lisp code and when such types are available generates
resonably good code (gcc can typically generate code that runs
2 times faster, but what sbcl code generator is doing is not bad).
ECL is much worse, in many cases it generates calls to general
routines instead of directly doing the job. My benchmarking
indicated that function calls in ECL are much slower than
in sbcl. IIUC this is because Lisp semantics allows fancy
argument processing and merely checking that no such
processing is need takes time and is relatively inefficient
in C. sbcl is doing equvalent thing using machine code idioms
which are more efficient for this purpose. Today, experiment
with parsing suggest that ECL garbage collector may be
seriously mistuned for FriCAS use: in sbcl parsing took 55 seconds.
In ECL FriCAS process run out of memory after burning 38
hours of CPU time. During that apparently it fully used
9 cores, so real time was about 4 hours. Since FriCAS code is
single-threaded the only semi-reasonable use of multiple cores
is for parallel garbage collection. It is possible that
ECL was heroically trying to fit data within its memory limit
(for large part of time process size was of order of ECL
heap limit). But even it that were the case, it looks like
mistuning: when program is at the edge of available memory
it is likely to fail in the future.
At least theoretically bad code generated by ECL could be
mitigated by directly generating C code from Spad. But
mistuned/slow garbage collector means that we probably
should avoid ECL for heavier work.
Long ago I informed Juan Jose Garcia-Ripoll who was developing
ECL about various problems and he was able to mitigate some
of them increasing ECL speed about 4 times. But core issues
remain and are harder to solve. And current ECL group actually
decreased performance, at least in terms of CPU efficiency,
which leads to longer build/test times.
--
Waldek Hebisch
--
You received this message because you are subscribed to the Google Groups
"FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/fricas-devel/ZiJS9TOSxE-RVToU%40fricas.org.