On Tuesday, August 16, 2016 at 9:12:51 AM UTC, Oliver Schulz wrote: > > I'd like to determine the optimal block size(s) for an array algorithm. >
That is done, as I said, e.g. with BLAS. But is it still relevant? Are cache-oblivious algorithms supposed to not need to know sizes and details and still be fast? They are all divide-and-conquer/recursive (a sufficient condition, there are some more if I recall (or not?)..): http://supertech.csail.mit.edu/papers/Prokop99.pdf "*Conventional wisdom says that recursive procedures should be converted into iterative loops in order to improve performance [8]. While this strategy was effective ten years ago, many recursive programs now actually run faster than their iterative counterparts.* So far most of the work by architects and compiler writers is concentrated on loop-based iterative programs." I think this applies for last-level size, e.g. you need not know it. I'm still thinking you can explain knowing L1 size, and if I recall, this paper goes not into details such as cache-line size, that I think you can also exploit. I guess the base-case for your recursion can stop at the L1 size and do something different. -- Palli
