Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

Matěj Týč Tue, 17 May 2016 13:50:22 -0700

On 17.5.2016 14:13, Sturla Molden wrote:

> Matěj Týč <matej....@gmail.com> wrote:
>
>>  - Parallel processing of HUGE data, and
> This is mainly a Windows problem, as copy-on-write fork() will solve this
> on any other platform. ...
That sounds interesting, could you elaborate on it a bit? Does it mean
that if you pass the numpy array to the child process using Queue, no
significant amount of data will flow through it? Or I shouldn't pass it
using Queue at all and just rely on inheritance? Finally, I assume that
passing it as an argument to the Process class is the worst option,
because it will be pickled and unpickled.


Or maybe you refer to modules s.a. joblib that use this functionality
and expose only a nice interface?
And finally, cow means that returning large arrays still involves data
moving between processes, whereas the shm approach has the workaround
that you can preallocate the result array by the parent process, where
the worker process can write to.
> What this means is that shared memory is seldom useful for sharing huge
> data, even on Windows. It is only useful for this on Unix/Linux, where base
> addresses can stay they same. But on non-Windows platforms, the COW will in
> 99.99% of the cases be sufficient, thus make shared memory superfluous
> anyway. We don't need shared memory to scatter large data on Linux, only
> fork.
I am actually quite comfortable with sharing numpy arrays only. It is a
nice format for sharing large amounts of numbers, which is what I want
and what many modules accept as input (e.g. the "shapely" module).

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

Reply via email to