Hello, 

On Thursday, December 19, 2013 11:31:18 PM UTC+1, Brendan O'Connor wrote:
>
> Hi, by my reading of the Julia manual, the following is not currently 
> possible.  I just wanted to check, am I correct?
>
> I'd like to create a large shared state, like a 10GB vector, and let 
> multiple worker processes all be able to access it.  Read-only access is 
> fine.  This is a common pattern for lots of machine learning algorithms 
> that I write (where the large shared state is the model parameters; for 
> example, the workers might compute model likelihood on different subsets of 
> the full dataset).
>
> I had to do a similar thing a while ago (long before Juia), training 
multiple SVMs with a big gram matrix for which everything but 1 row and 1 
column were shared.  The interface was plain C, and I would do this by just 
issuing a plain old fork() and let the OS manage the memory using 
copy-on-write paging.  The unique 1 row and 1 column in the gram matrix 
would be automagically allocated by this copy-on-write in the child 
processes.  This way I could use almost all of the full machine memory for 
a single SVM problem, and run as many problems as there were cores in 
parallel. 

Perhaps it is possible to wrap libc's fork() in a ccall.  This might happen 
without Julia even be aware... Getting data back from the child processes 
may be a bit tricky, I suppose you can always use the file system (that's 
what I did at the time) or communicate through a pipe or socket.

Cheers, 

---david 
 

> Currently, if you try to have multiple @parallel workers/threads access a 
> global array, it gets copied for each one.  (The manual says this is the 
> case, and I confirmed it in a test.)
>
> There seem to be some issues and pull requests that might be related, e.g. 
> https://github.com/JuliaLang/julia/issues/1790
>
> Thanks -- Brendan
> --
> brenocon.com
>
>

Reply via email to