|
Alan,
(suppose this is relevant to the list
too)
I re-read your email just now. I always try to
allocate the full array (2-dimensional or n-dimensional) at the point of
declaration. I had an incle-ing that this was the best method, especailly when
filling the array, but I was not aware that it would speed up reading of the
array significantly.
Remebering back to my High Performance computing
lectures, I think that when the array is allocated "slice by slice" there are
problems with cache misses. I remember something about accessing in stride in
order to achinve maximum cache hits.
I suspect that when you specify bounds at creation
the memory structure allocated is (x*y*bits per cell), I.E. the data is stored
in one big chunk and accessed using a single index:
memory_location = start_point +
x_index + (y_index * y_size)
On the other hand if you are specifying the size of
each strip (perhaps in a loop) then adding them to another 1-d array of arrays
(pointers to arrays in fact) then the memory locations dont have to be, and
rarely will be contiguous. Cache misses will occour each time you access a
new 1-d strip. The memroy location of a cell is probably calculated using
something like this:
memory_location =
array_of_start_points [y_index] + (x * x_size)
The penalty caused by this type of accessing would
be greatest for arrays of large y and small x bounds. Conversly smally y
and large x upper bounds would have a small cache miss penalty.
Im not sure if the retrieval of the start point (in
memory) for that particuar row would be cached, but I guess this complexity
could only cause slow-downs. I think this speed up trick chops this
"dereferencing" stage out, therefore yeilding an improvement, but this
improvement would obviously not be useful when the array's bounds are specified
at creation.
I will use your app as a testbed to check my
thoughts (these are only guesses) soon.
N
----- Original Message -----
|
Title: Re: improving performance in access to multidimensional array elements
