On Sun, Feb 23, 2003 at 03:33:53PM -0500, rif wrote:
> I just did my own little experiment (I can't post the source, as my
> computer with CMUCL isn't networked today, but I'll post it tomorrow).
> I consider on the one hand, a function to sum up the elements of a
> (simple-array double-float), and on the other, a function that sums up
> the double-float elements of a vector of f-d's, where an f-d is a
> structure containing a fixnum and a double-float.  
> [snip] 
> I found
> that the structure version was about 5.5 times slower than the
> (simple-array double-float) version.
> 
> A hit of 2.5 seems quite different from 5.5, although 2.5 is still
> enough that I'd be somewhat loath to write that way.

Here's my own stab at it (put the following forms in a file, compile
it, then load...):

(defstruct foo 
  (a 0d0 :type double-float)
  (b 0 :type fixnum))

(defparameter *array-of-foos*
  (let ((array (make-array '(1000)
                           :element-type 'foo
                           :initial-element (make-foo))))
    (loop for j from 0 below (length array)
      do (setf (aref array j) (make-foo :a 1d0)))
    array))

(defparameter *array-of-double*
  (make-array '(1000)
              :element-type 'double-float
              :initial-element 1d0))

(defun test-1 (array)
  (declare (optimize (speed 3) (safety 0) (debug 0))
           (type (simple-array double-float) array))
  (let ((sum 0d0))
    (declare (double-float sum))
    (dotimes (i 100000)
      (loop for j of-type fixnum from 0 below (length array)
        do (incf sum (aref array j))))
    sum))

(defun test-2 (array)
  (declare (optimize (speed 3) (safety 0) (debug 0))
           (type (simple-array foo) array))
  (let ((sum 0d0))
    (dotimes (i 100000)
      (declare (double-float sum))
      (loop for j of-type fixnum from 0 below (length array)
        do (incf sum (foo-a (aref array j)))))
    sum))


* (load "foo.x86f")
; Loading #p"/home/ggarza/foo.x86f".
T
* (lisp-implementation-version)
"18d"
* (time (test-1 *array-of-double*))
Compiling LAMBDA NIL: 
Compiling Top-Level Form: 

Evaluation took:
  0.51 seconds of real time
  0.51 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.
1.0d+8
* (time (test-2 *array-of-foos*))
Compiling LAMBDA NIL: 
Compiling Top-Level Form: 

Evaluation took:
  0.52 seconds of real time
  0.52 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.
1.0d+8
* 

The inner loop for TEST-1 is:

 75: L1:   FSTPD FR1
 77:       FLDD  [EDX+EBX*2+1]
 7B:       FXCH  FR1
 7D:       FADDD FR1
 7F:       ADD   EBX, 4
 82: L2:   CMP   EBX, ECX
 84:       JL    L1

And for TEST-2 it's:

 7D: L1:   MOV   ESI, [EDX+EBX+1]
 81:       MOV   ESI, [ESI+3]
 84:       FSTPD FR1
 86:       FLDD  [ESI+1]
 89:       FXCH  FR1
 8B:       FADDD FR1
 8D:       ADD   EBX, 4
 90: L2:   CMP   EBX, ECX
 92:       JL    L1

Pretty tight!

Looks like the test I did before the last message was incorrect
(or this one is).  It looks like the overhead is even tinier
then I thought....

> > Also, don't forget to declare the structure accessors inline. :)
> 
> Is it possible that I'm somehow not getting these inline, explaining
> the difference between the 2.5 and 5.5 factors?

As Gerd pointed out, CMUCL should inline them automatically.  I was
confused because when I "compile definition"'d the defstruct and a
function that used it with ilisp, CMUCL wasn't inlining it without a 
declaration for some reason.  COMPILE-FILE seems to work fine with
it, though.

Gabe Garza

Reply via email to