On Tuesday, August 16, 2016 at 9:12:51 AM UTC, Oliver Schulz wrote:
>
> I'd like to determine the optimal block size(s) for an array algorithm.
>

That is done, as I said, e.g. with BLAS. But is it still relevant? Are 
cache-oblivious algorithms supposed to not need to know sizes and details 
and still be fast? They are all divide-and-conquer/recursive (a sufficient 
condition, there are some more if I recall (or not?)..):

http://supertech.csail.mit.edu/papers/Prokop99.pdf

"*Conventional wisdom says that recursive procedures should be converted 
into iterative loops in order to improve performance [8]. While this 
strategy was effective ten years ago, many recursive programs now actually 
run faster than their iterative counterparts.* So far most of the work by 
architects and compiler writers is concentrated on loop-based iterative 
programs."

I think this applies for last-level size, e.g. you need not know it. I'm 
still thinking you can explain knowing L1 size, and if I recall, this paper 
goes not into details such as cache-line size, that I think you can also 
exploit.

I guess the base-case for your recursion can stop at the L1 size and do 
something different.

--
Palli

Reply via email to