Strula, this sounds brilliant! To be clear, you're talking about serializing the numpy array and reconstructing it in a way that's faster than pickle? Or using shared memory and signaling array creation around that shared memory rather than using pickle?
For what it's worth, I have used shared memory with numpy arrays as IPC (no queue), with one process writing to it and one process reading from it, and liked it. Your point #5 did not apply because I was reusing the shared memory. Do you have a public repo where you are working on this? Thanks! Elliot On Wed, May 11, 2016 at 3:29 AM, Sturla Molden <sturla.mol...@gmail.com> wrote: > I did some work on this some years ago. I have more or less concluded that > it was a waste of effort. But first let me explain what the suggested > approach do not work. As it uses memory mapping to create shared memory > (i.e. shared segments are not named), they must be created ahead of > spawning processes. But if you really want this to work smoothly, you want > named shared memory (Sys V IPC or posix shm_open), so that shared arrays > can be created in the spawned processes and passed back. > > Now for the reason I don't care about shared memory arrays anymore, and > what I am currently working on instead: > > 1. I have come across very few cases where threaded code cannot be used in > numerical computing. In fact, multithreading nearly always happens in the > code where I write pure C or Fortran anyway. Most often it happens in > library code that are already multithreaded (Intel MKL, Apple Accelerate > Framework, OpenBLAS, etc.), which means using it requires no extra effort > from my side. A multithreaded LAPACK library is not less multithreaded if I > call it from Python. > > 2. Getting shared memory right can be difficult because of hierarchical > memory and false sharing. You might not see it if you only have a multicore > CPU with a shared cache. But your code might not scale up on computers with > more than one physical processor. False sharing acts like the GIL, except > it happens in hardware and affects your C code invisibly without any > explicit locking you can pinpoint. This is also why MPI code tends to scale > much better than OpenMP code. If nothing is shared there will be no false > sharing. > > 3. Raw C level IPC is cheap – very, very cheap. Even if you use pipes or > sockets instead of shared memory it is cheap. There are very few cases > where the IPC tends to be a bottleneck. > > 4. The reason IPC appears expensive with NumPy is because multiprocessing > pickles the arrays. It is pickle that is slow, not the IPC. Some would say > that the pickle overhead is an integral part of the IPC ovearhead, but i > will argue that it is not. The slowness of pickle is a separate problem > alltogether. > > 5. Share memory does not improve on the pickle overhead because also NumPy > arrays with shared memory must be pickled. Multiprocessing can bypass > pickling the RawArray object, but the rest of the NumPy array is pickled. > Using shared memory arrays have no speed advantage over normal NumPy arrays > when we use multiprocessing. > > 6. It is much easier to write concurrent code that uses queues for message > passing than anything else. That is why using a Queue object has been the > popular Pythonic approach to both multitreading and multiprocessing. I > would like this to continue. > > I am therefore focusing my effort on the multiprocessing.Queue object. If > you understand the six points I listed you will see where this is going: > What we really need is a specialized queue that has knowledge about NumPy > arrays and can bypass pickle. I am therefore focusing my efforts on > creating a NumPy aware queue object. > > We are not doing the users a favor by encouraging the use of shared memory > arrays. They help with nothing. > > > Sturla Molden > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion