Thank you, folks, for your hints and suggestions!

Indeed, I re-wrote the code and got it substantially faster and well paralleled.

Insted of making inner loop parallel, I made parallel both of them. For that I had to convert 2d index into 1d, and then back to 2d. Essentially I had to calculate each element Aij of the matrix, and then I put everything to 1d array.

And yes, A = A ~ Aij was very slow, to avoid it I had to use 2d -> 1d mapping. I will check your solution as well as I like it too.

The more I use the D Language, the more I like it.

On Tuesday, 18 October 2022 at 16:07:22 UTC, Siarhei Siamashka wrote:
On Tuesday, 18 October 2022 at 11:56:30 UTC, Yura wrote:
```D
// Then for each Sphere, i.e. dot[i]
// I need to do some arithmetics with itself and other dots
// I have only parallelized the inner loop, i is fixed.

It's usually a much better idea to parallelize the outer loop. Even OpenMP tutorials explain this: https://ppc.cs.aalto.fi/ch3/nested/ (check the "collapse it into one loop" suggestion from it).

```D
for (auto j=0;j<Ai.length;j++) {
  A = A ~ Ai[j];
}
```

This way of appending to an array is very slow and `A ~= Ai[j];` is much faster. And even better would be `A ~= Ai;` instead of the whole loop.


Reply via email to