Dnia 08-09-2010 o 15:58:31 BCS <[email protected]> napisał(a):

Can't you compute the Kronecker product lazily? E.g. a proxy object
that  computes a value in an overloaded opIndex. Even if your
algorithms inspect  (compute) the same value several times, you may
still win -- the  bottleneck these days is memory access, not CPU
cycles.


If enough elements from the 4d matrix are accessed, in the wrong order, then the cache effects of doing it lazily might kill it. I'd guess that highly optimized code for doing the pre-compute version exists already.

Hm.. not sure what you mean by 'cache effects'. He was talking about working with a 200^4 matrix of doubles, which is a result of Kronecker product on two 200^2 matrices. Now, if my maths are right, the lazy version needs (2*200^2) * 8 = 640000 bytes of memory. So the whole thing fits comfortably into the on-die cache, and large chunks can be loaded to the faster per-core caches.

I'd say if the cache effects can kill anything, it'd be accessing elements of the precomputed result which is 200^4 * 8 = 12,800,000,000 bytes big.


Tomek

Reply via email to