First of all, thanks for the very detailed answer! I really appreciate it!

I have been using pretty much the same benchmark code, so times should be 
rather accurate.

I tried your examples and it seems to be working (btw, on macOS, the -fopenmp 
seems redundant for clang, as it supposedly has built-in support for openMP -- 
unless I'm mistaken).

The times I'm getting are pretty much these (for 1 billion repetitions):
    
    
    Wall time for normal loop: 16.175 s
    Wall time for parallelized OpenMP: 16.178 s
    Wall time for parallelized Nim spawn: 0.0 s
    
    
    Run

Note: for less repetitions, the OpenMP-based benchmark seems to be around 
20-30% faster.

Now, I have a question regarding the last solution: Is there any way that I can 
sync all the spawn processed? (I mean... know when all of them have finished)

Reply via email to