On Sat, Apr 12, 2014 at 12:07 AM, Sturla Molden <sturla.mol...@gmail.com> wrote: > On 12/04/14 00:39, Nathaniel Smith wrote: > >> The spawn mode is fine and all, but (a) the presence of something in >> 3.4 helps only a minority of users, (b) "spawn" is not a full >> replacement for fork; > > It basically does the same as on Windows. If you want portability to > Windows, you must abide by these restrictions anyway.
Yes, but "sorry Unix guys, we've decided to take away this nice feature from you because it doesn't work on Windows" is a really terrible argument. If it can't be made to work, then fine, but fork safety is just not *that* much to ask. >> with large read-mostly data sets it can be a >> *huge* win to load them into the parent process and then let them be >> COW-inherited by forked children. > > The thing is that Python reference counts breaks COW fork. This has been > discussed several times on the Python-dev list. What happens is that as > soon as the child process updates a refcount, the OS copies the page. > And because of how Python behaves, this copying of COW-marked pages > quickly gets excessive. Effectively the performance of os.fork in Python > will close to a non-COW fork. A suggested solution is to move the > refcount out of the PyObject struct, and perhaps keep them in a > dedicated heap. But doing so will be unfriendly to cache. Yes, it's limited, but again this is not a reason to break it in the cases where it *does* work. The case where I ran into this was loading a big language model using SRILM: http://www.speech.sri.com/projects/srilm/ https://github.com/njsmith/pysrilm This produces a single Python object that references an opaque, tens-of-gigabytes mess of C++ objects. For this case explicit shared mem is useless, but fork worked brilliantly. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion