Are you opening files via open or mmap in any of the functions that learningExperimentRun calls?
On Wed, Jun 1, 2016 at 11:42 AM, Martha White <[email protected]> wrote: > I am having difficulty understanding how to use pmap in Julia. I am a > reasonably experienced matlab and c programmer. However, I am new to Julia > and to using parallel functions. I am running an experiment with nested for > loops, benchmarking different algorithms. In the inner loop, I am running > the algorithms across multiple trials. I would like to parallelize this > inner loop (as the outer iteration I can easily run as multiple jobs on a > cluster). The code looks like: > > effNumCores = 3 > procids = addprocs(effNumCores) > > # This has to be added so that each run has access to these function > definitions > @everywhere include("experimentUtils.jl") > > # Initialize array of RMSE > fill!(runErrors, 0.0); > > # Split up runs across number of cores > outerloop = floor(Int, numRuns / effNumCores)+1 > r = 1 > rend = effNumCores > for i = 1:outerloop > rend = min(r+effNumCores-1, numRuns) > > # Empty RMSE passed, since it is create and returned in pmap_errors > Array{Float64}(0,0) > pmap_errors = pmap(r -> learningExperimentRun(mdp,hordeOfD, stepData, > alpha,lambda,beta, numAgents, numSteps, Array{Float64}(0,0), r), r:rend) > for j=1:(rend-r+1) > runErrors[:,:,MEAN_IND] += pmap_errors[j] > runErrors[:,:,VAR_IND] += pmap_errors[j].^2 > end > r += effNumCores > end > rmprocs(procids) > > The function called above is defined in separate file called > experimentUtils.jl, as > > function learningExperimentRun(mdp::MDP, hordeOfD::horde, > stepData::transData, alpha::Float64,lambda::Float64, beta::Float64, > numAgents::Int64, numSteps::Int64, RMSE::Array{Float64, 2}, runNum::Int64) > # if runErrors is empty, then initialize; this is empty for parallel version > if (isempty(RMSE)) > RMSE = zeros(Float64,numAgents, numSteps) > else > fill!(RMSE, 0.0) > end > > srand(runNum) > > agentInit(hordeOfD, mdp, alpha, beta,lambda,BETA_ETD) > getLearnerErrors(hordeOfD,mdp, RMSE,1) > mdpStart(mdp,stepData) > for i=2:numSteps > mdpStep(mdp,stepData) > updateLearners(stepData, mdp, hordeOfD) > getLearnerErrors(hordeOfD,mdp, RMSE,i) > end > > return RMSE > end > > When I try to run this, I get a large number of workers and get errors > that state that I have too many files open. I believe I must be doing > something seriously wrong. If anyone could help to parallelize this code in > julia, that would be fantastic. I am not tied to pmap, but after reading a > bit, it seemed to be the right function to use. > > > I should further add that I have an additional loop splitting runs over > cores, even though pmap could do that for me. I did this because pmap_errors > then becomes an array of numRuns (which could be 100s). By splitting it up > into loops, the returned pmap_errors has size that is at most the number of > cores. I am hoping that this memory then gets re-used when starting the > next loop over cores. > > I tried at first avoiding this by using a distributed array for runErrors. > But, this was not clearly documented and so I abandoned that approach. >
