On 17.5.2016 14:13, Sturla Molden wrote: > Matěj Týč <matej....@gmail.com> wrote: > >> - Parallel processing of HUGE data, and > This is mainly a Windows problem, as copy-on-write fork() will solve this > on any other platform. ... That sounds interesting, could you elaborate on it a bit? Does it mean that if you pass the numpy array to the child process using Queue, no significant amount of data will flow through it? Or I shouldn't pass it using Queue at all and just rely on inheritance? Finally, I assume that passing it as an argument to the Process class is the worst option, because it will be pickled and unpickled.
Or maybe you refer to modules s.a. joblib that use this functionality and expose only a nice interface? And finally, cow means that returning large arrays still involves data moving between processes, whereas the shm approach has the workaround that you can preallocate the result array by the parent process, where the worker process can write to. > What this means is that shared memory is seldom useful for sharing huge > data, even on Windows. It is only useful for this on Unix/Linux, where base > addresses can stay they same. But on non-Windows platforms, the COW will in > 99.99% of the cases be sufficient, thus make shared memory superfluous > anyway. We don't need shared memory to scatter large data on Linux, only > fork. I am actually quite comfortable with sharing numpy arrays only. It is a nice format for sharing large amounts of numbers, which is what I want and what many modules accept as input (e.g. the "shapely" module). _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion