On Fri, Jan 10, 2014 at 9:18 AM, Julian Taylor <jtaylor.deb...@googlemail.com> wrote: > On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith <n...@pobox.com> wrote: >> >> Also, none of the Py* interfaces implement calloc(), which is annoying >> because it messes up our new optimization of using calloc() for >> np.zeros. [...] > > > Another thing that is not directly implemented in Python is aligned > allocation. This is going to get increasingly important with the advent > heavily vectorized x86 CPUs (e.g. AVX512 is rolling out now) and the C > malloc being optimized for the oldish SSE (16 bytes). I want to change the > array buffer allocation to make use of posix_memalign and C11 aligned_malloc > if available to avoid some penalties when loading from non 32 byte aligned > buffers. I could imagine it might also help coprocessors and gpus to have > higher alignments, but I'm not very familiar with that type of hardware. > The allocator used by the Python3.4 is plugable, so we could implement our > special allocators with the new API, but only when 3.4 is more widespread. > > For this reason and missing calloc I don't think we should use the Python > API for data buffers just yet. Any benefits are relatively small anyway.
It really would be nice if our data allocations would all be visible to the tracemalloc library though, somehow. And I doubt we want to patch *all* Python allocations to go through posix_memalign, both because this is rather intrusive and because it would break python -X tracemalloc. How certain are we that we want to switch to aligned allocators in the future? If we don't, then maybe it makes to ask python-dev for a calloc interface; but if we do, then I doubt we can convince them to add aligned allocation interfaces, and we'll need to ask for something else (maybe a "null" allocator, which just notifies the python memory tracking machinery that we allocated something ourselves?). It's not obvious to me why aligning data buffers is useful - can you elaborate? There's no code simplification, because we always have to handle the unaligned case anyway with the standard unaligned startup/cleanup loops. And intuitively, given the existence of such loops, alignment shouldn't matter much in practice, since the most that shifting alignment can do is change the number of elements that need to be handled by such loops by (SIMD alignment value / element size). For doubles, in a buffer that has 16 byte alignment but not 32 byte alignment, this means that worst case, we end up doing 4 unnecessary non-SIMD operations. And surely that only matters for very small arrays (for large arrays such constant overhead will amortize out), but for small arrays SIMD doesn't help much anyway? Probably I'm missing something, because you actually know something about SIMD and I'm just hand-waving from first principles :-). But it'd be nice to understand the reasoning for why/whether alignment really helps in the numpy context. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion