Hey everyone, I've been looking at parallel programming in julia and was getting some very unexpected results and rather bad performance because of this. Sadly I ran out of ideas of what could be going on, disproving all ideas I had. Hence this post :)
I was able to construct a similar (simpler) example which exhibits the same behavior (attached file). The example is a very naive and suboptimal implementation in many ways (the actual code is much more optimal), but that's not the issue. The issue I'm trying to investigate is the big difference in worker time when a single worker is active and when multiple are active. Ideas I disproved: - julia processes pinned to a single core - julia process uses multiple threads to do the work, and processes are fighting for the cores - not enough cores on the machine (there are plenty) - htop nicely shows 4 julia processes working on different cores - there is no communication at the application level stalling anyone All I'm left with now is that julia is doing some hidden synchronization somewhere. Any input is appreciated. Thanks in advance. Kind regards, Tom
parallel-test.jl
Description: Binary data
