On Sun, Feb 23, 2003 at 03:33:53PM -0500, rif wrote:
> I just did my own little experiment (I can't post the source, as my
> computer with CMUCL isn't networked today, but I'll post it tomorrow).
> I consider on the one hand, a function to sum up the elements of a
> (simple-array double-float), and on the other, a function that sums up
> the double-float elements of a vector of f-d's, where an f-d is a
> structure containing a fixnum and a double-float.
> [snip]
> I found
> that the structure version was about 5.5 times slower than the
> (simple-array double-float) version.
>
> A hit of 2.5 seems quite different from 5.5, although 2.5 is still
> enough that I'd be somewhat loath to write that way.
Here's my own stab at it (put the following forms in a file, compile
it, then load...):
(defstruct foo
(a 0d0 :type double-float)
(b 0 :type fixnum))
(defparameter *array-of-foos*
(let ((array (make-array '(1000)
:element-type 'foo
:initial-element (make-foo))))
(loop for j from 0 below (length array)
do (setf (aref array j) (make-foo :a 1d0)))
array))
(defparameter *array-of-double*
(make-array '(1000)
:element-type 'double-float
:initial-element 1d0))
(defun test-1 (array)
(declare (optimize (speed 3) (safety 0) (debug 0))
(type (simple-array double-float) array))
(let ((sum 0d0))
(declare (double-float sum))
(dotimes (i 100000)
(loop for j of-type fixnum from 0 below (length array)
do (incf sum (aref array j))))
sum))
(defun test-2 (array)
(declare (optimize (speed 3) (safety 0) (debug 0))
(type (simple-array foo) array))
(let ((sum 0d0))
(dotimes (i 100000)
(declare (double-float sum))
(loop for j of-type fixnum from 0 below (length array)
do (incf sum (foo-a (aref array j)))))
sum))
* (load "foo.x86f")
; Loading #p"/home/ggarza/foo.x86f".
T
* (lisp-implementation-version)
"18d"
* (time (test-1 *array-of-double*))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
Evaluation took:
0.51 seconds of real time
0.51 seconds of user run time
0.0 seconds of system run time
0 page faults and
0 bytes consed.
1.0d+8
* (time (test-2 *array-of-foos*))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
Evaluation took:
0.52 seconds of real time
0.52 seconds of user run time
0.0 seconds of system run time
0 page faults and
0 bytes consed.
1.0d+8
*
The inner loop for TEST-1 is:
75: L1: FSTPD FR1
77: FLDD [EDX+EBX*2+1]
7B: FXCH FR1
7D: FADDD FR1
7F: ADD EBX, 4
82: L2: CMP EBX, ECX
84: JL L1
And for TEST-2 it's:
7D: L1: MOV ESI, [EDX+EBX+1]
81: MOV ESI, [ESI+3]
84: FSTPD FR1
86: FLDD [ESI+1]
89: FXCH FR1
8B: FADDD FR1
8D: ADD EBX, 4
90: L2: CMP EBX, ECX
92: JL L1
Pretty tight!
Looks like the test I did before the last message was incorrect
(or this one is). It looks like the overhead is even tinier
then I thought....
> > Also, don't forget to declare the structure accessors inline. :)
>
> Is it possible that I'm somehow not getting these inline, explaining
> the difference between the 2.5 and 5.5 factors?
As Gerd pointed out, CMUCL should inline them automatically. I was
confused because when I "compile definition"'d the defstruct and a
function that used it with ilisp, CMUCL wasn't inlining it without a
declaration for some reason. COMPILE-FILE seems to work fine with
it, though.
Gabe Garza