06.05.2020 07:52, data pulverizer пишет:
On Wednesday, 6 May 2020 at 04:04:14 UTC, Mathias LANG wrote:
On Wednesday, 6 May 2020 at 03:41:11 UTC, data pulverizer wrote:
Yes, that's exactly what I want the actual computation I'm running is
much more expensive and much larger. It shouldn't matter if I have
like 100_000_000 threads should it? The threads should just be queued
until the cpu works on it?
It does matter quite a bit. Each thread has its own resources
allocated to it, and some part of the language will need to interact
with *all* threads, e.g. the GC.
In general, if you want to parallelize something, you should aim to
have as many threads as you have cores. Having 100M threads will mean
you have to do a lot of context switches. You might want to look up
the difference between tasks and threads.
Sorry, I meant 10_000 not 100_000_000 I square the number by mistake
because I'm calculating a 10_000 x 10_000 matrix it's only 10_000 tasks,
so 1 task does 10_000 calculations. The actual bit of code I'm
parallelising is here:
```
auto calculateKernelMatrix(T)(AbstractKernel!(T) K, Matrix!(T) data)
{
long n = data.ncol;
auto mat = new Matrix!(T)(n, n);
foreach(j; taskPool.parallel(iota(n)))
{
auto arrj = data.refColumnSelect(j).array;
for(long i = j; i < n; ++i)
{
mat[i, j] = K.kernel(data.refColumnSelect(i).array, arrj);
mat[j, i] = mat[i, j];
}
}
return mat;
}
```
At the moment this code is running a little bit faster than threaded
simd optimised Julia code, but as I said in an earlier reply to Ali when
I look at my system monitor, I can see that all the D threads are active
and running at ~ 40% usage, meaning that they are mostly doing nothing.
The Julia code runs all threads at 100% and is still a tiny bit slower
so my (maybe incorrect?) assumption is that I could get more performance
from D. The method `refColumnSelect(j).array` is (trying to) reference a
column from a matrix (1D array with computed index referencing) which I
select from the matrix using:
```
return new Matrix!(T)(data[startIndex..(startIndex + nrow)], [nrow, 1]);
```
If I use the above code, I am I wrong in assuming that the sliced data
(T[]) is referenced rather than copied? That so if I do:
```
auto myData = data[5...10];
```
myData is referencing elements [5..10] of data and not creating a new
array with elements data[5..10] copied?
General advice - try to avoid using `array` and `new` in hot code.
Memory allocating is slow in general, except if you use carefully
crafted custom memory allocators. And that can easily be the reason of
40% cpu usage because the cores are waiting for the memory subsystem.