Hello Dima,

some bottlenecks are known and I'm working on these. The worst offenders:

- the generic function dispatch is slow

  I have implemented the computation part of the fast generic function dispatch 
as proposed by Robert Strandh, but it needs to be integrated with the C compiler

- there is no type inference (only the type propagation)

  ECL can go really fast when it knows about its types. I have plans for that, 
but some extra work needs to be done first

Both things hang on refactoring the compiler (that task is pending, you may see 
cmpc-separation branch), so it will be easier to work with the intermediate 
representation and experiment with backends. There are also other motivations 
for this refactor.

There is also the fact that ECL /compilation/ time is very slow. Currently 
there is not much we can do about this, because most of the time is spend in 
GCC (so nothing to optimize for us).

Another problem is FASL loading - when ECL loads a fasl then it replays 
necessary side-effects and that is time consuming (you may notice this for 
example when you REQUIRE ASDF). This is not much of the problem in itself, but 
said side-effects need to be replayed even when we build an executable, so the 
startup suffers. Other implementations hide that startup time by dumping 
images, where all side effects are already present.

Also, if you are not using the C compiler (i.e only the bytecode), then the 
result is not optimized at all - the bytecodes compiler performs only the 
minimal compilation.

All that said, when both fast gf and type inference are implemented, I will try 
to identify further bottlenecks if things still doesn't look good.

None of these possible improvements will be part of the upcoming release. We 
are currently in the testing phase (not thanks to me, I'm disappointingly not 
very active on this front at the moment - sorry Marius!).

Here are a few hints that will help you to produce better optimized code:
- avoid generic functions
- declare types wherever feasible
- lower safety to 1, raise speed to 3 (don't use safety 0, there are known bugs)

There are also more mundane ways to improve the performance:
- inline partial dispatch tables for arithmetic operators
- work harder on IR to optimize it (using SSA and adding more passes is 
pending™)

Best regards,
Daniel

p.s we should also introduce more immediate types on 64bit platforms - we are 
currently using only two available bits for tagging while we could use three, 
but I'm not working on that at the moment - single-float could be unboxed in 
that case

--
Daniel Kochmański ;; aka jackdaniel | Przemyśl, Poland
TurtleWare - Daniel Kochmański      | www.turtleware.eu

"Be the change that you wish to see in the world." - Mahatma Gandhi



------- Original Message -------
On Friday, July 14th, 2023 at 2:19 PM, Dima Pasechnik <dimpase+...@gmail.com> 
wrote:


> It's well-known that ECL-compiled CL projects are considerably slower
> than ones where SBCL is used. Examples are e.g. Maxima, FriCAS - there
> speed might be few times (sic!) slower.
> 
> Is there an effort to find out bottlenecks, or is it known where these
> bottlenecks are?
> 
> Best,
> Dima

Reply via email to