Re: foreach (i; taskPool.parallel(0..2_000_000)

H. S. Teoh via Digitalmars-d-learn Wed, 05 Apr 2023 16:09:09 -0700

On Wed, Apr 05, 2023 at 10:34:22PM +0000, Paul via Digitalmars-d-learn wrote:
> On Tuesday, 4 April 2023 at 22:20:52 UTC, H. S. Teoh wrote:
> 
> > Best practices for arrays in hot loops:
[...]
> > - Where possible, prefer sequential access over random access (take
> >   advantage of the CPU cache hierarchy).
> 
> Thanks for sharing Teoh!  Very helpful.
> 
> would this be random access? for(size_t i; i<arr.length; i++) using
> indices?  ...and this be sequential foreach(a;arr) ?
> 
> or would they have to be completely different kinds of containers?  a
> dlang 'range' vs arr[]?
[...]


The exact syntactic construct you use is not important; under the hood,
for(i; i<arr.length; i++) and foreach(a;arr) are exactly the same thing.
What's important is the actual pattern of memory access.  Looping from
the first element of an array to the last element is sequential access;
doing binary search on an array is random access (because it's
unpredictable where the next access will be). Traversing a linked list
is also random access, because in general individual nodes will be
stored in different places in memory. So doing a hashtable lookup.

Basically, modern CPUs have a cache hierarchy and memory prefetching
controlled by the access predictor. When the predictor sees memory being
accessed sequentially, it can speed up future accesses by preloading
subsequent blocks of memory into cache lines so that when the CPU next
tries to read from that location it's already in the cache and it
doesn't have to wait for a fetch cycle from RAM, which is slow.  The bad
thing about things like hashtables is because it's unpredictable where
the next access will be, so the CPU's predictor won't be able to preload
the next access for you. So you'll incur a fetch cycle from RAM every
time.

Of course, this doesn't mean that hashtable (or other data structure)
lookups are necessarily bad.  If the alternative is linear scan through
very large arrays, then hashtables will speed up your inner loop in
spite of the cache misses. So it depends on your exact use case. But if
your arrays are not overly large, scanning through the entire array may
sometimes be faster than doing many hashtable lookups.  Some modern
hashtable implementations are also adapted to take advantage of the
cache hierarchy, so it may not be as bad as a naïve implementation.

Nevertheless, the general principle is that locality (adjacent memory
accesses are close to each other) is good, and should generally be
preferred over strategies that require jumping around in memory. So your
data structures and algorithms should be designed in a way that takes
advantage of linear access where possible.


T

-- 
In theory, there is no difference between theory and practice.

Re: foreach (i; taskPool.parallel(0..2_000_000)

Reply via email to