I'm trying to write code for sun grid engine (sge) although I think the 
general idea applies to any addprocs.  I would like to be able to request a 
gazillion nodes, and start using each shortly after it becomes available.

An example of what I want is roughly this code:

   for j=1:1000000
      @async begin
         new_worker = addprocs_sge(1); #request to add one sun grid engine 
process
         worker_init(new_worker) #brings new worker into the work
       end
   end

The problem with this code is that addprocs (and thus addprocs_sge) seems 
to be something like one big critical section, so the @async is effectively 
non-existent, and the procs get added serially.  The big problem with this 
is that it might take a day before I get even the first worker (when 
addprocs_sge returns).  With this code, I would wait that day and only then 
would I start the request for the second worker, which might take 
approximately another day, and so on.  I want to get all my requests in the 
queue, right from the start, so once I get the first worker, I'm also next 
in line for the second, third, ...



The alternative code below effectively gets the request for all workers in 
the queue right from the start

   new_workers = addprocs(1000000)  #request to add 1,000,000 sun grid 
engine processes
   worker_init(new_workers)

but the problem with it is that I don't get any work done until all 
1,000,000 processes become available because the call to addprocs doesn't 
return until it has everything (even though nodes on sge start to become 
owned and blocked by me while it's trying to collect the whole million).

Is there a way around this?  (I'm using Julia 0.4.3.  I would love to 
upgrade, but I use a large amount of code that I don't control and isn't 
going to be updated any time soon.  At the same time, I'd be interested in 
solutions related to other versions regardless.)  Thanks.

Ryan

Reply via email to