On Friday, 19 August 2022 at 02:02:57 UTC, Adam D Ruppe wrote:

Even if they aren't equal, you'll get decent benefit from parallel on the outer one alone, but not as good since the work won't be balanced.

Unless there's some kind of blocking going on in D's implementation, if the number of passes on the outer loop is large enough relative to the number of cores, applying parallel to the outer loop is the best you can do - uneven amounts of work on the inner loop will get spread out across cores. There are always counterexamples, but the ratio of passes to cores needs to be pretty low for further optimizations to have any chance to help.

Reply via email to