I'm trying to write code for sun grid engine (sge) although I think the general idea applies to any addprocs. I would like to be able to request a gazillion nodes, and start using each shortly after it becomes available.
An example of what I want is roughly this code: for j=1:1000000 @async begin new_worker = addprocs_sge(1); #request to add one sun grid engine process worker_init(new_worker) #brings new worker into the work end end The problem with this code is that addprocs (and thus addprocs_sge) seems to be something like one big critical section, so the @async is effectively non-existent, and the procs get added serially. The big problem with this is that it might take a day before I get even the first worker (when addprocs_sge returns). With this code, I would wait that day and only then would I start the request for the second worker, which might take approximately another day, and so on. I want to get all my requests in the queue, right from the start, so once I get the first worker, I'm also next in line for the second, third, ... The alternative code below effectively gets the request for all workers in the queue right from the start new_workers = addprocs(1000000) #request to add 1,000,000 sun grid engine processes worker_init(new_workers) but the problem with it is that I don't get any work done until all 1,000,000 processes become available because the call to addprocs doesn't return until it has everything (even though nodes on sge start to become owned and blocked by me while it's trying to collect the whole million). Is there a way around this? (I'm using Julia 0.4.3. I would love to upgrade, but I use a large amount of code that I don't control and isn't going to be updated any time soon. At the same time, I'd be interested in solutions related to other versions regardless.) Thanks. Ryan