There's something weird happening in my recently parallelized code. When
running it without adding some worker processes first, the results are
completely off and after some investigation I found that this was due to
assignment operations going wrong - results of computations were assigned
to different Arrays than the intended ones. A small working example
illustrating the point:
x1 = linspace(1, 3, 3)
x2 = linspace(1, 3, 3)
x3 = linspace(1, 3, 3)
function getresults(x1::Array, x2::Array, x3::Array)
result1 = SharedArray(Float64, (3,3,3))
result2 = similar(result1)
result3 = similar(result1)
@sync @parallel for a=1:3
for b=1:3
for c=1:3
result1[a,b,c] = x1[a]*x2[b]*x3[c]
result2[a,b,c] = sqrt(x1[a]*x2[b]*x3[c])
result3[a,b,c] = (x1[a]*x2[b]*x3[c])^2
end
end
end
return sdata(result1), sdata(result2), sdata(result3)
end
(r1,r2,r3) = getresults(x1, x2, x3)
nprocs()==CPU_CORES || addprocs(CPU_CORES-1)
(r1_par,r2_par,r3_par) = getresults(x1, x2, x3)
When I run this on my system (v0.3.6), the parallelized version works as
intended, while running the code without adding workers first gives the
expected results for r1 and r3, but r2 holds the same results as r3. The
behaviour in my original problem was similar, the code returns three
Arrays, but running it without additional workers those Arrays all return
the same contents.
Is there something in the @sync or @parallel macros that causes this? How
should a code be written to ensure that it works both with one and multiple
cores?