On Tue, 3 Feb 2004, Raymond Toy wrote:

>     Thomas> (1) Yes, there _is_ a  speed gain by implementing evaluation 
>     Thomas> "at compile time" - which may seem a bit strange, considering the 
>     Thomas> tremendous amount of instruction-level parallelism in current 
> processors, 
>     Thomas> but you can expect roughly about 20% for a 10-th order polynomial with 
>     Thomas> nonsparse coefficients.
> 
> How did you arrive at this conclusion?  CMUCL with and without
> "compile-time" evaluation?  Really just curious.

Basically yes.

> CMUCL isn't particularly smart about array references and doesn't do
> any kind of instruction scheduling, so I would guess (wildly) that
> cmucl would not beath C++.  Maybe equal, but not faster.  Unless you
> take advantage of some special structure of the polynomial, perhaps.

Well, at least the code it generates is not overly stupid, and with 
reasonable modern superscalar/superpipelined processor architectures(*), a 
lot of out-of-order evaluation is scheduled already directly in silicon, 
dynamically ( -> Tomasulo Algorithm).

(*) i.e. _not_ the itanium.


-- 
regards,               [EMAIL PROTECTED]              (o_
 Thomas Fischbacher -  http://www.cip.physik.uni-muenchen.de/~tf  //\
(lambda (n) ((lambda (p q r) (p p q r)) (lambda (g x y)           V_/_
(if (= x 0) y (g g (- x 1) (* x y)))) n 1))                  (Debian GNU)

Reply via email to