parallel methods and performance

Dmitry Olshansky via Digitalmars-d-learn Mon, 19 Feb 2018 21:46:23 -0800

On Monday, 19 February 2018 at 14:57:22 UTC, SrMordred wrote:

On Monday, 19 February 2018 at 05:54:53 UTC, Dmitry Olshanskywrote:
The operation is trivial and dataset is rather small. In suchcases SIMD with eg array ops is the way to go:
result[] = values[] * values2[];
Yes, absolutely right :)
I make a simple example to understand why the threads are notscaling in the way i thought they would.

Yeah, the world is ugly place where trivial math sometimesdoesn’t work.


I suggest to:
- run with different number of threads from 1 to n
- vary sizes from 100k to 10m

For your numbers - 400ms / 64 is ~ 6ms, if we divide by # coresit’s 6/7 ~ 0.86ms which is a deal smaller then a CPU timeslice.

In essence a single core runs fast b/c it doesn’t wait for allothers to complete via join easily burning its quota in one go.In MT I bet some of overhead comes from not all threads finishing(and starting) at once, so the join block in the kernel.

You could run your MT code with strace to see if it hits thefutex call or some such, if it does that’s where you are gettingdelays. (that’s assuming you are on Linux)

std.parallel version is a bit faster b/c I think it cachescreated threadpool so you don’t start threads anew on each run.

I imagine that, if one core work is done in 200ms a 4 core workwill be done in 50ms, plus some overhead, since they areworking on separate block of memory, without need of sync, andwithout false sharing, etc (at least I think i don´t have thisproblem here).

If you had a long queue of small tasks like that and you don’twait to join all threads untill absolutely required you get nearperfect scalability. (Unless hitting other bottlenecks like RAM).

Re: multithread/concurrency/parallel methods and performance

Reply via email to