Fawzi Mohamed wrote:
On 2009-03-22 09:45:32 +0100, Don <[email protected]> said:

Trass3r wrote:
Don schrieb:
I abandoned it largely because array operations got into the language; since then I've been working on getting the low-level math language stuff working.
Don't worry, I haven't gone away!

I see.


http://www.dsource.org/projects/lyla

Though array operations still only give us SIMD and no multithreading (?!).

There's absolutely no way you'd want multithreading on a BLAS1 operation. It's not until BLAS3 that you become computation-limited.

not true, if your vector is large you could still use several threads.

That's surprising. I confess to never having benchmarked it, though.
If the vector is large, all threads are competing for the same L2 and L3 cache bandwidth, right? (Assuming a typical x86 situation where every CPU has an L1 cache and the L2 and L3 caches are shared). So multiple cores should never be beneficial whenever the RAM->L3 or L3->L2 bandwidth is the bottleneck, which will be the case for most BLAS1-style operations at large sizes. And at small sizes, the thread overhead is significant, wiping out any potential benefit.
What have I missed?

but you are right that using multiple thread at low level is a dangerous thing, because it might be better to use just one thread, and parallelize another operation at a higher level. Thus you need sort of know how many threads are really available for that operation.

Yes, if you have a bit more context, it can be a clear win.

I am trying to tackle that problem in blip, by having a global scheduler, that I am rewriting.

I look forward to seeing it!


I think the best approach is lyla's, taking an existing, optimized C BLAS library and writing some kind of wrapper using operator overloading etc. to make programming easier and more intuitive.

blyp.narray.NArray does that if compiled with -version=blas, but I think that for large vector/matrixes you can do better (exactly using multithreading).

I suspect that with 'shared' and 'immutable' arrays, D can do better than C, in theory. I hope it works out in practice.


In my opinion, we actually need matrices in the standard library, with a very small number of primitive operations built-in (much like Fortran does). Outside those, I agree, wrappers to an existing library should be used.


Reply via email to