On Thu, Jan 9, 2014 at 11:21 PM, Charles R Harris <charlesr.har...@gmail.com> wrote: > Apropos Julian's changes to use the PyObject_* allocation suite for some > parts of numpy, I posted the following > > I think numpy memory management is due a cleanup. Currently we have > > PyDataMem_* > PyDimMem_* > PyArray_* > > Plus the malloc, PyMem_*, and PyObject_* interfaces. That is six ways to > manage heap allocations. As far as I can tell, PyArray_* is always PyMem_* > in practice. We probably need to keep the PyDataMem family as it has a > memory tracking option, but PyDimMem just confuses things, I'd rather just > use PyMem_* with explicit size. Curiously, the PyObject_Malloc family is not > documented apart from some release notes. > > We should also check for the macro versions of PyMem_* as they are > deprecated for extension modules. > > Nathaniel then suggested that we consider going all Python allocators, > especially as new memory tracing tools are coming online in 3.4. Given that > these changes could have some impact on current extension writers I thought > I'd bring this up on the list to gather opinions. > > Thoughts?
After a bit more research, some further points to keep in mind: Currently, PyDimMem_* and PyArray_* are just aliases for malloc/free, and PyDataMem_* is an alias for malloc/free with some extra tracing hooks wrapped around it. (AFAIK, these tracing hooks are not used by anyone anywhere -- at least, if they are I haven't heard about it, and there is no code on github that uses them.) There is one substantial difference between the PyMem_* and PyObject_* interfaces as compared to malloc(), which is that the Py* interfaces require that the GIL be held when they are called. (@Julian -- I think your PR we just merged fulfills this requirement, is that right?) I strongly suspect that we have PyDataMem_* calls outside of the GIL -- e.g., when allocating ufunc buffers -- and third-party code might as well. Python 3.4's new memory allocation API and tracing stuff is documented here: http://www.python.org/dev/peps/pep-0445/ http://docs.python.org/dev/c-api/memory.html http://docs.python.org/dev/library/tracemalloc.html In particular, 3.4 adds a set of PyRawMem_* functions, which do not require the GIL. Checking the current source code for _tracemalloc.c, it appears that PyRawMem_* functions *are* traced, so that's nice - that means that switching PyDataMem_* to use PyRawMem_* would be both safe and provide benefits. However, PyRawMem_* does not provide the pymalloc optimizations for small allocations. Also, none of the Py* interfaces implement calloc(), which is annoying because it messes up our new optimization of using calloc() for np.zeros. (calloc() is generally faster than malloc()+explicit zeroing, because it can use OS-specific virtual memory tricks to zero out the memory "for free". These same tricks also mean that if you use np.zeros() to allocate a large array, and then only write to a few entries in that array, the total memory used is proportional to the number of non-zero entries, rather than to the actual size of the array, which can be extremely useful in some situations as a kind of "poor man's sparse array".) I'm pretty sure that the vast majority of our allocations do occur with GIL protection, so we might want to switch to using PyObject_* for most cases to take advantage of the small-object optimizations, and use PyRawMem_* for any non-GIL cases (like possibly ufunc buffers), with a compatibility wrapper to replace PyRawMem_* with malloc() on pre-3.4 pythons. Of course this will need some profiling to see if PyObject_* is actually better than malloc() in practice. For calloc(), we could try and convince python-dev to add this, or np.zeros() could explicitly use calloc() even when other code uses Py* interface and then uses an ndarray flag or special .base object to keep track of the fact that we need to use free() to deallocate this memory, or we could give up on the calloc optimization. -n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion