On Wed, Jan 8, 2014 at 12:13 PM, Julian Taylor <jtaylor.deb...@googlemail.com> wrote: > On 18.07.2013 15:36, Nathaniel Smith wrote: >> On Wed, Jul 17, 2013 at 5:57 PM, Frédéric Bastien <no...@nouiz.org> wrote: >>> On the usefulness of doing only 1 memory allocation, on our old gpu ndarray, >>> we where doing 2 alloc on the GPU, one for metadata and one for data. I >>> removed this, as this was a bottleneck. allocation on the CPU are faster the >>> on the GPU, but this is still something that is slow except if you reuse >>> memory. Do PyMem_Malloc, reuse previous small allocation? >> >> Yes, at least in theory PyMem_Malloc is highly-optimized for small >> buffer re-use. (For requests >256 bytes it just calls malloc().) And >> it's possible to define type-specific freelists; not sure if there's >> any value in doing that for PyArrayObjects. See Objects/obmalloc.c in >> the Python source tree. > > PyMem_Malloc is just a wrapper around malloc, so its only as optimized > as the c library is (glibc is not good for small allocations). > PyObject_Malloc uses a small object allocator for requests smaller 512 > bytes (256 in python2).
Right, I meant PyObject_Malloc of course. > I filed a pull request [0] replacing a few functions which I think are > safe to convert to this API. The nditer allocation which is completely > encapsulated and the construction of the scalar and array python objects > which are deleted via the tp_free slot (we really should not support > third party libraries using PyMem_Free on python objects without checks). > > This already gives up to 15% improvements for scalar operations compared > to glibc 2.17 malloc. > Do I understand the discussions here right that we could replace > PyDimMem_NEW which is used for strides in PyArray with the small object > allocation too? > It would still allow swapping the stride buffer, but every application > must then delete it with PyDimMem_FREE which should be a reasonable > requirement. That sounds reasonable to me. If we wanted to get even more elaborate, we could by default stick the shape/strides into the same allocation as the PyArrayObject, and then defer allocating a separate buffer until someone actually calls PyArray_Resize. (With a new flag, similar to OWNDATA, that tells us whether we need to free the shape/stride buffer when deallocating the array.) It's got to be a vanishingly small proportion of arrays where PyArray_Resize is actually called, so for most arrays, this would let us skip the allocation entirely, and the only cost would be that for arrays where PyArray_Resize *is* called to add new dimensions, we'd leave the original buffers sitting around until the array was freed, wasting a tiny amount of memory. Given that no-one has noticed that currently *every* array wastes 50% of this much memory (see upthread), I doubt anyone will care... -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion