Don schrieb:
The next D2 runtime will include my cache-size detection code. This
makes it possible to write a cache-aware memcpy, using (for example)
non-temporal writes when the arrays being copied exceed the size of the
largest cache.
In my tests, it gives a speed-up of approximately 2X in such cases.
The downside is, it's a fair bit of work to implement, and it only
affects extremely large arrays, so I fear it's basically irrelevant (It
probably won't help arrays < 32K in size). Do people actually copy
megabyte-sized arrays?
Is it worth spending any more time on it?
Well, arrays > 32K aren't that unsual, esp. in scientific computing.
Even a small 200x200 matrix makes up 40000*8 bytes.