Thank you, folks, for your hints and suggestions!
Indeed, I re-wrote the code and got it substantially faster and
well paralleled.
Insted of making inner loop parallel, I made parallel both of
them. For that I had to convert 2d index into 1d, and then back
to 2d. Essentially I had to calculate each element Aij of the
matrix, and then I put everything to 1d array.
And yes, A = A ~ Aij was very slow, to avoid it I had to use 2d
-> 1d mapping. I will check your solution as well as I like it
too.
The more I use the D Language, the more I like it.
On Tuesday, 18 October 2022 at 16:07:22 UTC, Siarhei Siamashka
wrote:
On Tuesday, 18 October 2022 at 11:56:30 UTC, Yura wrote:
```D
// Then for each Sphere, i.e. dot[i]
// I need to do some arithmetics with itself and other dots
// I have only parallelized the inner loop, i is fixed.
It's usually a much better idea to parallelize the outer loop.
Even OpenMP tutorials explain this:
https://ppc.cs.aalto.fi/ch3/nested/ (check the "collapse it
into one loop" suggestion from it).
```D
for (auto j=0;j<Ai.length;j++) {
A = A ~ Ai[j];
}
```
This way of appending to an array is very slow and `A ~=
Ai[j];` is much faster. And even better would be `A ~= Ai;`
instead of the whole loop.