Am 13.03.2015 um 16:20 schrieb Pieter Barendrecht <[email protected]>:
> Thanks! I tried both approaches you suggested. Some results using > SharedArrays (100,000 simulations) > > #workers #time > 1 ~120s > 3 ~42s > 6 ~40s > > Short question. The first print statement after the for-loop is already > executed before the for-loop ends. How do I prevent this from happening? > > Some results using the other approach (again 100,000 simulations) > > #workers #time > 1 ~118s > 2 ~60s > 3 ~42s > 4 ~38s > 6 ~40s > 6 ~40s > Could you post a simplified code snippet? Either here on in a gist. It is difficult to know what exactly you doing ;-) > Couple of questions. My equivalent of "myfunc_pure()" also requires a second > argument. Is that argument changing, or is this there to switch between different algorithms etc? > In addition, I don't make use of the "startindex" argument in the function. > What's the common approach here? Next, there are actually multiple variables > that should be returned, not just "result". You can always return (a,b,c) instead of a, i.e. a tuple. The function you provide to reduce then has the following signature: myreducer(a::Tuple, b::Tuple). Combine the tuples, and again return a tuple. > > Overall, I'm a bit surprised that using more than 3 or 4 workers does not > decrease the running time. Any ideas? I'm using Julia 0.3.6 on a 64bit Arch > Linux system, Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz. Can be any number of things, could be the memory bandwidth being the limiting factor, or that the computation is actually nicely sped up and a lot of what you see is communication overhead. In that case, work on chunks of data / batches of itertations, i.e. dont pmap over millions of things but only a couple dozen. Looking at the code might shed some light. > > On Friday, March 13, 2015 at 8:37:19 AM UTC, René Donner wrote: > Perhaps SharedArrays are what you need here? > http://docs.julialang.org/en/release-0.3/stdlib/parallel/?highlight=sharedarray#Base.SharedArray > > > Reading from a shared array in workers is fine, but when different workers > try to update the same part of that array you will get racy behaviour and > most likely not the correct result. > > Can you somehow re-formulate your problem along these lines, using a map and > reduce approach using a pure function? > > @everywhere function myfunc_pure(startindex) > result = zeros(Int,10) > for i in startindex + (0:19) # 20 iterations > result[mod(i,length(result))+1] += 1 > end > result > end > reduce(+,pmap(myfunc_pure, 1:5)) # 5 blocks of 20 iterations > > Like this you don't have a shared mutable state and thus no risk for > mess-ups. > > > > > Am 13.03.2015 um 00:56 schrieb Pieter Barendrecht <[email protected]>: > > > I'm wondering how to save data/results in a parallel for-loop. Let's assume > > there is a single Int64 array, initialised using zeros() before starting > > the for-loop. In the for-loop (typically ~100,000 iterations, that's the > > reason I'm interested in parallel processing) the entries of this Int64 > > array should be increased (based on the results of an algorithm that's > > invoked in the for-loop). > > > > Everything works fine when using just a single proc, but I'm not sure how > > to modify the code such that, when using e.g. addprocs(4), the data/results > > stored in the Int64 array can be processed once the for-loop ends. The > > algorithm (a separate function) is available to all procs (using the > > require() function). Just using the Int64 array in the for-loop (using > > @parallel for k=1:100000) does not work as each proc receives its own copy, > > so after the for-loop it contains just zeros (as illustrated in a set of > > slides on the Julia language). I guess it involves @spawn and fetch() > > and/or pmap(). Any suggestions or examples would be much appreciated :).
