First of all, thanks for the very detailed answer! I really appreciate it!
I have been using pretty much the same benchmark code, so times should be
rather accurate.
I tried your examples and it seems to be working (btw, on macOS, the -fopenmp
seems redundant for clang, as it supposedly has built-in support for openMP --
unless I'm mistaken).
The times I'm getting are pretty much these (for 1 billion repetitions):
Wall time for normal loop: 16.175 s
Wall time for parallelized OpenMP: 16.178 s
Wall time for parallelized Nim spawn: 0.0 s
Run
Note: for less repetitions, the OpenMP-based benchmark seems to be around
20-30% faster.
Now, I have a question regarding the last solution: Is there any way that I can
sync all the spawn processed? (I mean... know when all of them have finished)