On Aug 4, 3:24 am, "Anne Archibald" <[EMAIL PROTECTED]> wrote:
> It seems to me two things are needed: > > * A mechanism for requesting numpy arrays with buffers aligned to an > arbitrary power-of-two size (basically just using posix_memalign or > some horrible hack on platforms that don't have it). Right, you might as well allow the alignment (to a power-of-two size) to be specified at runtime, as there is really no cost to implementing an arbitrary alignment once you have any alignment. Although you should definitely use posix_memalign (or the old memalign) where it is available, unfortunately it's not implemented on all systems. e.g. MacOS X and FreeBSD don't have it, last I checked (although in both cases their malloc is 16-byte aligned). Microsoft VC ++ has a function called _aligned_malloc which is equivalent. However, since MinGW (www.mingw.org) didn't have an _aligned_malloc function, I wrote one for them a few years ago and put it in the public domain (I use MinGW to cross-compile to Windows from Linux and need the alignment). You are free to use it as a fallback on systems that don't have a memalign function if you want. It should work on any system where sizeof(void*) is a power of two (i.e. every extant architecture, that I know of). You can download it and its test program from: ab-initio.mit.edu/~stevenj/align.c ab-initio.mit.edu/~stevenj/tstalign.c It just uses malloc with a little extra padding as needed to align the data, plus a copy of the original pointer so that you can still free and realloc (using _aligned_free and _aligned_realloc). It could be made a bit more efficient, but it probably doesn't matter. > * A macro (in C, and some way to get the same information from python, > perhaps just "a.ctypes.data % 16") to test for common alignment cases; > SIMD alignment and arbitrary power-of-two alignment are probably > sufficient. In C this is easy, just ((uintptr_t) pointer) % 16 == 0. You might also consider a way to set the default alignment of numpy arrays at runtime, rather than requesting aligned arrays individually. e.g. so that someone could come along at a later date to a large program and just add one function call to make all the arrays 16-byte aligned to improve performance using SIMD libraries. Regards, Steven G. Johnson _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion