Re: foreach - premature optimization vs cultivating good habits

via Digitalmars-d-learn Sat, 31 Jan 2015 13:56:38 -0800

On Friday, 30 January 2015 at 14:41:11 UTC, Laeeth Isharc wrote:

Thanks, Adam. That's what I had thought (your firstparagraph), but something Ola on a different thread confused meand made me think I didn't understand it, and I wanted to pinit down.

There is always significant optimization effects in long runningloops:

- SIMD
- cache locality / prefetching

For the former (SIMD) you need to make sure that good code isgenerated either by hand, by using vectorized libraries or byauto vectorization.

For the latter (cache) you need to make sure that the prefetcheris able to predict or is being told to prefetch explicitly andalso that the working set is small enough to stay at the fastercache levels.

If you want good performance you cannot ignore any of these, andyou have to design the data structures and algorithms for it.Prefetching has to happen maybe 100 instructions before theactual load from memory and AVX requires byte alignment and alayout that fits the algorithm. On next gen Xeon Skylake I thinkthe alignment might go up to 64 byte and you have 512 bits wideregisters (so you can do 8 64 bit floating point operations inparallel per core). The difference between issuing 1-4 ops andissuing 8-16 per time unit is noticable...

An of course, the closer your code is to theoretical throughputin the CPU, the more critical it becomes to not wait for memoryloads.


This is also a moving target...

Re: foreach - premature optimization vs cultivating good habits

Reply via email to