Hello, On Thursday, December 19, 2013 11:31:18 PM UTC+1, Brendan O'Connor wrote: > > Hi, by my reading of the Julia manual, the following is not currently > possible. I just wanted to check, am I correct? > > I'd like to create a large shared state, like a 10GB vector, and let > multiple worker processes all be able to access it. Read-only access is > fine. This is a common pattern for lots of machine learning algorithms > that I write (where the large shared state is the model parameters; for > example, the workers might compute model likelihood on different subsets of > the full dataset). > > I had to do a similar thing a while ago (long before Juia), training multiple SVMs with a big gram matrix for which everything but 1 row and 1 column were shared. The interface was plain C, and I would do this by just issuing a plain old fork() and let the OS manage the memory using copy-on-write paging. The unique 1 row and 1 column in the gram matrix would be automagically allocated by this copy-on-write in the child processes. This way I could use almost all of the full machine memory for a single SVM problem, and run as many problems as there were cores in parallel.
Perhaps it is possible to wrap libc's fork() in a ccall. This might happen without Julia even be aware... Getting data back from the child processes may be a bit tricky, I suppose you can always use the file system (that's what I did at the time) or communicate through a pipe or socket. Cheers, ---david > Currently, if you try to have multiple @parallel workers/threads access a > global array, it gets copied for each one. (The manual says this is the > case, and I confirmed it in a test.) > > There seem to be some issues and pull requests that might be related, e.g. > https://github.com/JuliaLang/julia/issues/1790 > > Thanks -- Brendan > -- > brenocon.com > >
