Fawzi Mohamed wrote:
On 2009-03-22 09:45:32 +0100, Don <[email protected]> said:
Trass3r wrote:
Don schrieb:
I abandoned it largely because array operations got into the
language; since then I've been working on getting the low-level math
language stuff working.
Don't worry, I haven't gone away!
I see.
http://www.dsource.org/projects/lyla
Though array operations still only give us SIMD and no multithreading
(?!).
There's absolutely no way you'd want multithreading on a BLAS1
operation. It's not until BLAS3 that you become computation-limited.
not true, if your vector is large you could still use several threads.
That's surprising. I confess to never having benchmarked it, though.
If the vector is large, all threads are competing for the same L2 and L3
cache bandwidth, right?
(Assuming a typical x86 situation where every CPU has an L1 cache and
the L2 and L3 caches are shared).
So multiple cores should never be beneficial whenever the RAM->L3 or
L3->L2 bandwidth is the bottleneck, which will be the case for most
BLAS1-style operations at large sizes.
And at small sizes, the thread overhead is significant, wiping out any
potential benefit.
What have I missed?
but you are right that using multiple thread at low level is a dangerous
thing, because it might be better to use just one thread, and
parallelize another operation at a higher level.
Thus you need sort of know how many threads are really available for
that operation.
Yes, if you have a bit more context, it can be a clear win.
I am trying to tackle that problem in blip, by having a global
scheduler, that I am rewriting.
I look forward to seeing it!
I think the best approach is lyla's, taking an existing, optimized C
BLAS library and writing some kind of wrapper using operator
overloading etc. to make programming easier and more intuitive.
blyp.narray.NArray does that if compiled with -version=blas, but I think
that for large vector/matrixes you can do better (exactly using
multithreading).
I suspect that with 'shared' and 'immutable' arrays, D can do better
than C, in theory. I hope it works out in practice.
In my opinion, we actually need matrices in the standard library, with
a very small number of primitive operations built-in (much like
Fortran does). Outside those, I agree, wrappers to an existing library
should be used.