On Wed, May 13, 2020, 3:02 AM Mark Fletcher <mark2...@gmail.com> wrote:

> . So the question has become academic but I would like to get some
> sort of explanation so I can adjust for the future.
>

It used to be the case that AMD caches performed vastly differently than
Intel. That will especially be so as you stride across that big array. What
you want is to adjust the stride in order to match cache behavior. You
don't want the L1/2/3 caches thrashing as you step through it. Ive seen
cases where it makes an order-of-magnitude difference even where RAM is
calm either way.

Row-major versus column-major matrix striding is one of the sticking points.

Mark
>
>

Reply via email to