On Wed, Dec 19, 2012 at 3:27 PM, Charles R Harris <charlesr.har...@gmail.com> wrote: > > > On Wed, Dec 19, 2012 at 8:10 AM, Nathaniel Smith <n...@pobox.com> wrote: >> Right, my intuition is that it's like order="C" -- if you make a new >> array by, say, indexing, then it may or may not have order="C", no >> guarantees. So when you care, you call asarray(a, order="C") and that >> either makes a copy or not as needed. Similarly for base alignment. >> >> I guess to push this analogy even further we could define a set of >> array flags, ALIGNED_8, ALIGNED_16, etc. (In practice only power-of-2 >> alignment matters, I think, so the number of flags would remain >> manageable?) That would make the C API easier to deal with too, no >> need to add PyArray_FromAnyAligned. >> > > Another possibility is an aligned datatype, basically an aligned structured > array with floats/ints in chunks of the appropriate size. IIRC, gcc support > for sse is something like that.
True; right now it looks like structured dtypes have no special alignment: In [13]: np.dtype("f4,f4").alignment Out[13]: 1 So for this approach we'd need a way to create structured dtypes with .alignment == .itemsize, and we'd need some way to request dtype-aligned memory from array allocation functions. I guess existing NPY_ALIGNED is a good enough public interface for the latter, but AFAICT the current implementation is to just assume that whatever malloc() returns will always be ALIGNED. This is true for all base C types, but not for more exotic record types with larger alignment requirements -- that would require some fancier allocation scheme. Not sure which interface is more useful to users. On the one hand, using funny dtypes makes regular non-SIMD access more cumbersome, and it forces your array size to be a multiple of the SIMD word size, which might be inconvenient if your code is smart enough to handle arbitrary-sized arrays with partial SIMD acceleration (i.e., using SIMD for most of the array, and then a slow path to handle any partial word at the end). OTOH, if your code *is* that smart, you should probably just make it smart enough to handle a partial word at the beginning as well and then you won't need any special alignment in the first place, and representing each SIMD word as a single numpy scalar is an intuitively appealing model of how SIMD works. OTOOH, just adding a single argument np.array() is a much simpler to explain than some elaborate scheme involving the creation of special custom dtypes. -n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion