Are you opening files via open or mmap in any of the functions
that learningExperimentRun calls?

On Wed, Jun 1, 2016 at 11:42 AM, Martha White <[email protected]>
wrote:

> I am having difficulty understanding how to use pmap in Julia. I am a
> reasonably experienced matlab and c programmer. However, I am new to Julia
> and to using parallel functions. I am running an experiment with nested for
> loops, benchmarking different algorithms. In the inner loop, I am running
> the algorithms across multiple trials. I would like to parallelize this
> inner loop (as the outer iteration I can easily run as multiple jobs on a
> cluster). The code looks like:
>
> effNumCores = 3
> procids = addprocs(effNumCores)
>
> # This has to be added so that each run has access to these function 
> definitions
> @everywhere include("experimentUtils.jl")
>
> # Initialize array of RMSE
> fill!(runErrors, 0.0);
>
> # Split up runs across number of cores
> outerloop = floor(Int, numRuns / effNumCores)+1
> r = 1
> rend = effNumCores
> for i = 1:outerloop
>     rend = min(r+effNumCores-1, numRuns)
>
>     # Empty RMSE passed, since it is create and returned in pmap_errors 
> Array{Float64}(0,0)
>     pmap_errors = pmap(r -> learningExperimentRun(mdp,hordeOfD, stepData, 
> alpha,lambda,beta, numAgents, numSteps, Array{Float64}(0,0), r), r:rend)
>     for j=1:(rend-r+1)
>         runErrors[:,:,MEAN_IND] += pmap_errors[j]
>         runErrors[:,:,VAR_IND] += pmap_errors[j].^2
>     end
>     r += effNumCores
> end
> rmprocs(procids)
>
> The function called above is defined in separate file called
> experimentUtils.jl, as
>
> function learningExperimentRun(mdp::MDP, hordeOfD::horde, 
> stepData::transData, alpha::Float64,lambda::Float64, beta::Float64, 
> numAgents::Int64, numSteps::Int64, RMSE::Array{Float64, 2}, runNum::Int64)
>   # if runErrors is empty, then initialize; this is empty for parallel version
>   if (isempty(RMSE))
>      RMSE = zeros(Float64,numAgents, numSteps)
>   else
>     fill!(RMSE, 0.0)
>   end
>
>  srand(runNum)
>
>  agentInit(hordeOfD, mdp, alpha, beta,lambda,BETA_ETD)
>  getLearnerErrors(hordeOfD,mdp, RMSE,1)
>  mdpStart(mdp,stepData)
>  for i=2:numSteps
>    mdpStep(mdp,stepData)
>    updateLearners(stepData, mdp, hordeOfD)
>    getLearnerErrors(hordeOfD,mdp, RMSE,i)
>  end
>
>  return RMSE
> end
>
> When I try to run this, I get a large number of workers and get errors
> that state that I have too many files open. I believe I must be doing
> something seriously wrong. If anyone could help to parallelize this code in
> julia, that would be fantastic. I am not tied to pmap, but after reading a
> bit, it seemed to be the right function to use.
>
>
> I should further add that I have an additional loop splitting runs over
> cores, even though pmap could do that for me. I did this because pmap_errors
> then becomes an array of numRuns (which could be 100s). By splitting it up
> into loops, the returned pmap_errors has size that is at most the number of
> cores. I am hoping that this memory then gets re-used when starting the
> next loop over cores.
>
> I tried at first avoiding this by using a distributed array for runErrors.
> But, this was not clearly documented and so I abandoned that approach.
>

Reply via email to