Re: Is 2X faster large memcpy interesting?

JC Fri, 27 Mar 2009 09:15:27 -0700

The applications that I write usually work with matrices of size 600x600 up to2000x2000 and since they are doubles, that is a good chunk of memory.

Unleash the optimizations!
JC

Don wrote:

The next D2 runtime will include my cache-size detection code. Thismakes it possible to write a cache-aware memcpy, using (for example)non-temporal writes when the arrays being copied exceed the size of thelargest cache.
In my tests, it gives a speed-up of approximately 2X in such cases.
The downside is, it's a fair bit of work to implement, and it onlyaffects extremely large arrays, so I fear it's basically irrelevant (Itprobably won't help arrays < 32K in size). Do people actually copymegabyte-sized arrays?
Is it worth spending any more time on it?
BTW: I tested the memcpy() code provided in AMD's 1992 optimisationmanual, and in Intel's 2007 manual. Only one of them actually gave anybenefit when run on a 2008 Intel Core2 -- which was it? (Hint: it wasn'tIntel!)I've noticed that AMD's docs are usually greatly superior to Intels, butthis time the difference is unbelievable.

Re: Is 2X faster large memcpy interesting?

Reply via email to