I am having difficulty understanding how to use pmap in Julia. I am a 
reasonably experienced matlab and c programmer. However, I am new to Julia 
and to using parallel functions. I am running an experiment with nested for 
loops, benchmarking different algorithms. In the inner loop, I am running 
the algorithms across multiple trials. I would like to parallelize this 
inner loop (as the outer iteration I can easily run as multiple jobs on a 
cluster). The code looks like:

effNumCores = 3
procids = addprocs(effNumCores)

# This has to be added so that each run has access to these function definitions
@everywhere include("experimentUtils.jl")

# Initialize array of RMSE
fill!(runErrors, 0.0);

# Split up runs across number of cores
outerloop = floor(Int, numRuns / effNumCores)+1
r = 1
rend = effNumCores
for i = 1:outerloop
    rend = min(r+effNumCores-1, numRuns)

    # Empty RMSE passed, since it is create and returned in pmap_errors 
Array{Float64}(0,0)
    pmap_errors = pmap(r -> learningExperimentRun(mdp,hordeOfD, stepData, 
alpha,lambda,beta, numAgents, numSteps, Array{Float64}(0,0), r), r:rend)
    for j=1:(rend-r+1)
        runErrors[:,:,MEAN_IND] += pmap_errors[j]
        runErrors[:,:,VAR_IND] += pmap_errors[j].^2
    end
    r += effNumCores
end
rmprocs(procids)

The function called above is defined in separate file called 
experimentUtils.jl, as

function learningExperimentRun(mdp::MDP, hordeOfD::horde, stepData::transData, 
alpha::Float64,lambda::Float64, beta::Float64, numAgents::Int64, 
numSteps::Int64, RMSE::Array{Float64, 2}, runNum::Int64)
  # if runErrors is empty, then initialize; this is empty for parallel version
  if (isempty(RMSE))
     RMSE = zeros(Float64,numAgents, numSteps)
  else
    fill!(RMSE, 0.0)
  end

 srand(runNum)

 agentInit(hordeOfD, mdp, alpha, beta,lambda,BETA_ETD)
 getLearnerErrors(hordeOfD,mdp, RMSE,1)
 mdpStart(mdp,stepData)
 for i=2:numSteps
   mdpStep(mdp,stepData)
   updateLearners(stepData, mdp, hordeOfD)
   getLearnerErrors(hordeOfD,mdp, RMSE,i)
 end

 return RMSE
end

When I try to run this, I get a large number of workers and get errors that 
state that I have too many files open. I believe I must be doing something 
seriously wrong. If anyone could help to parallelize this code in julia, 
that would be fantastic. I am not tied to pmap, but after reading a bit, it 
seemed to be the right function to use.


I should further add that I have an additional loop splitting runs over 
cores, even though pmap could do that for me. I did this because pmap_errors 
then becomes an array of numRuns (which could be 100s). By splitting it up 
into loops, the returned pmap_errors has size that is at most the number of 
cores. I am hoping that this memory then gets re-used when starting the 
next loop over cores.

I tried at first avoiding this by using a distributed array for runErrors. 
But, this was not clearly documented and so I abandoned that approach.

Reply via email to