Re: optimization question on arrays of types

rif Mon, 24 Feb 2003 07:30:11 -0800

It seems that the relative difference in speed of access between an
array of double-floats and an array of structures depends strongly on
the size of the arrays in question.  On my machine, your code as
written:


* (time (test-1 *array-of-double*))
Compiling LAMBDA NIL: 
Compiling Top-Level Form: 

Evaluation took:
  0.52 seconds of real time
  0.52 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.
1.0d+8
* (time (test-2 *array-of-foos*))
Compiling LAMBDA NIL: 
Compiling Top-Level Form: 

Evaluation took:
  0.86 seconds of real time
  0.85 seconds of user run time
  0.01 seconds of system run time
  0 page faults and
  0 bytes consed.
1.0d+8

More difference than you observed, but still not too large.  Now I
change the code so that the arrays are length 1,000,000 instead of
length 1,000, and the dotimes in each function is only 100.  New
runtimes:

* (time (test-1 *array-of-double*))
Compiling LAMBDA NIL: 
Compiling Top-Level Form: 

Evaluation took:
  0.94 seconds of real time
  0.94 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.
1.0d+8
* (time (test-2 *array-of-foos*))
Compiling LAMBDA NIL: 
Compiling Top-Level Form: 

Evaluation took:
  4.4 seconds of real time
  4.37 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.
1.0d+8
* 

As the arrays get longer, both versions get slower, but the structure
version becomes much slower in relative terms.  I conjecture that this
is because of cache miss effects.  Supporting this idea, making the
structure bigger exacerbates the effect --- if we redefine foo as

(defstruct foo 
  (a 0d0 :type double-float)
  (b 0 :type fixnum)
  (c (make-array 10 :initial-element 0d0 :element-type 'double-float)
     :type (simple-array double-float (10))))

and rerun test-2:

* (time (test-2 *array-of-foos*))
Compiling LAMBDA NIL: 
Compiling Top-Level Form: 

Evaluation took:
  7.23 seconds of real time
  7.2 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.
1.0d+8
* 

Whether this is a problem in an application depends on how big your
arrays are (and I'd guess on how much else is going on to stress the
cache in the function in question), but I think it does show that it's
possible for the difference in timings between arrays of structures and
directly coded arrays to be quite large.

rif

> On Sun, Feb 23, 2003 at 03:33:53PM -0500, rif wrote:
> > I just did my own little experiment (I can't post the source, as my
> > computer with CMUCL isn't networked today, but I'll post it tomorrow).
> > I consider on the one hand, a function to sum up the elements of a
> > (simple-array double-float), and on the other, a function that sums up
> > the double-float elements of a vector of f-d's, where an f-d is a
> > structure containing a fixnum and a double-float.  
> > [snip] 
> > I found
> > that the structure version was about 5.5 times slower than the
> > (simple-array double-float) version.
> > 
> > A hit of 2.5 seems quite different from 5.5, although 2.5 is still
> > enough that I'd be somewhat loath to write that way.
> 
> Here's my own stab at it (put the following forms in a file, compile
> it, then load...):
> 
> (defstruct foo 
>   (a 0d0 :type double-float)
>   (b 0 :type fixnum))
> 
> (defparameter *array-of-foos*
>   (let ((array (make-array '(1000)
>                          :element-type 'foo
>                          :initial-element (make-foo))))
>     (loop for j from 0 below (length array)
>       do (setf (aref array j) (make-foo :a 1d0)))
>     array))
> 
> (defparameter *array-of-double*
>   (make-array '(1000)
>             :element-type 'double-float
>             :initial-element 1d0))
> 
> (defun test-1 (array)
>   (declare (optimize (speed 3) (safety 0) (debug 0))
>          (type (simple-array double-float) array))
>   (let ((sum 0d0))
>     (declare (double-float sum))
>     (dotimes (i 100000)
>       (loop for j of-type fixnum from 0 below (length array)
>       do (incf sum (aref array j))))
>     sum))
> 
> (defun test-2 (array)
>   (declare (optimize (speed 3) (safety 0) (debug 0))
>          (type (simple-array foo) array))
>   (let ((sum 0d0))
>     (dotimes (i 100000)
>       (declare (double-float sum))
>       (loop for j of-type fixnum from 0 below (length array)
>       do (incf sum (foo-a (aref array j)))))
>     sum))
> 
> 
> * (load "foo.x86f")
> ; Loading #p"/home/ggarza/foo.x86f".
> T
> * (lisp-implementation-version)
> "18d"
> * (time (test-1 *array-of-double*))
> Compiling LAMBDA NIL: 
> Compiling Top-Level Form: 
> 
> Evaluation took:
>   0.51 seconds of real time
>   0.51 seconds of user run time
>   0.0 seconds of system run time
>   0 page faults and
>   0 bytes consed.
> 1.0d+8
> * (time (test-2 *array-of-foos*))
> Compiling LAMBDA NIL: 
> Compiling Top-Level Form: 
> 
> Evaluation took:
>   0.52 seconds of real time
>   0.52 seconds of user run time
>   0.0 seconds of system run time
>   0 page faults and
>   0 bytes consed.
> 1.0d+8
> * 
> 
> The inner loop for TEST-1 is:
> 
>  75: L1:   FSTPD FR1
>  77:       FLDD  [EDX+EBX*2+1]
>  7B:       FXCH  FR1
>  7D:       FADDD FR1
>  7F:       ADD   EBX, 4
>  82: L2:   CMP   EBX, ECX
>  84:       JL    L1
> 
> And for TEST-2 it's:
> 
>  7D: L1:   MOV   ESI, [EDX+EBX+1]
>  81:       MOV   ESI, [ESI+3]
>  84:       FSTPD FR1
>  86:       FLDD  [ESI+1]
>  89:       FXCH  FR1
>  8B:       FADDD FR1
>  8D:       ADD   EBX, 4
>  90: L2:   CMP   EBX, ECX
>  92:       JL    L1
> 
> Pretty tight!
> 
> Looks like the test I did before the last message was incorrect
> (or this one is).  It looks like the overhead is even tinier
> then I thought....
> 
> > > Also, don't forget to declare the structure accessors inline. :)
> > 
> > Is it possible that I'm somehow not getting these inline, explaining
> > the difference between the 2.5 and 5.5 factors?
> 
> As Gerd pointed out, CMUCL should inline them automatically.  I was
> confused because when I "compile definition"'d the defstruct and a
> function that used it with ilisp, CMUCL wasn't inlining it without a 
> declaration for some reason.  COMPILE-FILE seems to work fine with
> it, though.
> 
> Gabe Garza

Re: optimization question on arrays of types

Reply via email to