Dnia 08-09-2010 o 15:58:31 BCS <[email protected]> napisał(a):
Can't you compute the Kronecker product lazily? E.g. a proxy object
that computes a value in an overloaded opIndex. Even if your
algorithms inspect (compute) the same value several times, you may
still win -- the bottleneck these days is memory access, not CPU
cycles.
If enough elements from the 4d matrix are accessed, in the wrong order,
then the cache effects of doing it lazily might kill it. I'd guess that
highly optimized code for doing the pre-compute version exists already.
Hm.. not sure what you mean by 'cache effects'. He was talking about
working with a 200^4 matrix of doubles, which is a result of Kronecker
product on two 200^2 matrices. Now, if my maths are right, the lazy
version needs (2*200^2) * 8 = 640000 bytes of memory. So the whole thing
fits comfortably into the on-die cache, and large chunks can be loaded to
the faster per-core caches.
I'd say if the cache effects can kill anything, it'd be accessing elements
of the precomputed result which is 200^4 * 8 = 12,800,000,000 bytes big.
Tomek