Yo,
I'm back again with some optimization questions. I stumbled across
this by pure chance: I have a rather complicated function that needs
some fresh arrays on each invocation. The function is also called
quite often, i.e. fresh arrays are needed quite often. I created these
arrays with MAKE-ARRAY and, I don't know why, after a while I had the
idea to use templates of my arrays instead and use COPY-SEQ to create
the fresh arrays. And, presto, the code runs much faster now. Here's a
simple example:
(defun foo (n x)
(dotimes (i n)
(let ((a (make-array x :initial-element 0)))
(setf (svref a 4) 42))))
(defun bar (n x)
(let ((template (make-array x :initial-element 0)))
(dotimes (i n)
(let ((a (copy-seq template)))
(setf (svref a 4) 42)))))
The compiled code yields these results in CMUCL 18d:
* (time (foo 1000000 10))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
Evaluation took:
7.01 seconds of real time
6.805664 seconds of user run time
0.171875 seconds of system run time
[Run times include 0.54 seconds GC run time]
0 page faults and
95666256 bytes consed.
NIL
* (time (bar 1000000 10))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
Evaluation took:
1.77 seconds of real time
1.606445 seconds of user run time
0.147461 seconds of system run time
[Run times include 0.33 seconds GC run time]
0 page faults and
63763840 bytes consed.
NIL
*
I think this is rather funny 'cause I would have expected FOO to be at
least as fast as BAR. As it turns out this is the case in the other CL
implementations I've tested (ACL, CLISP, LW) - only LW favors BAR for
small N.
Some other observations:
1. This "optimization" only holds for small X. On my machine BAR is
faster for X < 100 approximately while FOO wins for bigger X (which
is perfect for my app).
2. Also, the speed gains are lost if X is fixed, i.e.
(defun foo (n)
(dotimes (i n)
(let ((a (make-array 10 :initial-element 0)))
(setf (svref a 4) 42))))
(defun bar (n)
(let ((template (make-array 10 :initial-element 0)))
(dotimes (i n)
(let ((a (copy-seq template)))
(setf (svref a 4) 42)))))
In this case FOO always wins.
I'm pretty happy that this behaviour means a significant speed boost
for my app but nevertheless I'd be interested to know what exactly is
happening here.
Thanks,
Edi.