On 27/12/2021 12:10 AM, max haughton wrote:
I would start by removing the use of stdout in your loop kernel - I'm not familiar with what you are calculating, but if you can basically have the (parallel) loop operate from (say) one array directly into another then you can get extremely good parallel scaling with almost no effort.

Not using in the actual loop should make the code faster even without threads because having a function call in the hot code will mean compilers optimizer will give up on certain transformations - i.e. do all the work as compactly as possible then output the data in one step at the end.

It'll speed it up significantly.

Standard IO has locks in it. So you end up with all calculations grinding to a half waiting for another thread to finish doing something.

Reply via email to