Re: Is 2X faster large memcpy interesting?

Georg Wrede Thu, 26 Mar 2009 13:50:20 -0700

Don wrote:

The next D2 runtime will include my cache-size detection code. Thismakes it possible to write a cache-aware memcpy, using (for example)non-temporal writes when the arrays being copied exceed the size of thelargest cache.
In my tests, it gives a speed-up of approximately 2X in such cases.
The downside is, it's a fair bit of work to implement, and it onlyaffects extremely large arrays, so I fear it's basically irrelevant (Itprobably won't help arrays < 32K in size). Do people actually copymegabyte-sized arrays?
Is it worth spending any more time on it?
BTW: I tested the memcpy() code provided in AMD's 1992 optimisationmanual, and in Intel's 2007 manual. Only one of them actually gave anybenefit when run on a 2008 Intel Core2 -- which was it? (Hint: it wasn'tIntel!)I've noticed that AMD's docs are usually greatly superior to Intels, butthis time the difference is unbelievable.

What's the alternative? What would you do instead? Is there somethingcooler or more important for D to do?


(IMHO, if the other alternatives have any merit, then I'd vote for them.)

But then again, you've already invested in this, and it clearlyinterests you. Labourious, yes, but it sounds fun.

Re: Is 2X faster large memcpy interesting?

Reply via email to