On 8/3/07, David Cournapeau <[EMAIL PROTECTED]> wrote: > > Andrew Straw wrote: > > Dear David, > > > > Both ideas, particularly the 2nd, would be excellent additions to numpy. > > I often use the Intel IPP (Integrated Performance Primitives) Library > > together with numpy, but I have to do all my memory allocation with the > > IPP to ensure fastest operation. I then create numpy views of the data. > > All this works brilliantly, but it would be really nice if I could > > allocate the memory directly in numpy. > > > > IPP allocates, and says it wants, 32 byte aligned memory (see, e.g. > > http://www.intel.com/support/performancetools/sb/CS-021418.htm ). Given > > that fftw3 apparently wants 16 byte aligned memory, my feeling is that, > > if the effort is made, the alignment width should be specified at > > run-time, rather than hard-coded. > I think that doing it at runtime would be overkill, no ? I was thinking > about making it a compile option. Generally, at the ASM level, you need > 16 bytes alignment (for instructions like movaps, which takes 16 bytes > in memory and put it in the SSE registers), this is not just fftw. Maybe > the 32 bytes alignment is useful for cache reasons, I don't know. > > I don't think it would be difficult to implement and validate; what I > don't know at all is the implication of this at the binary level, if any.
Here's a hack that google turned up: (1) Use static variables instead of dynamic (stack) variables (2) Use in-line assembly code that explicitly aligns data (3) In C code, use "*malloc*" to explicitly allocate variables Here is Intel's example of (2): ; procedure prologue push ebp mov esp, ebp and ebp, -8 sub esp, 12 ; procedure epilogue add esp, 12 pop ebp ret Intel's example of (3), slightly modified: double *p, *newp; p = (double*)*malloc* ((sizeof(double)*NPTS)+4); newp = (p+4) & (~7); This assures that newp is 8-*byte* aligned even if p is not. However, *malloc*() may already follow Intel's recommendation that a *32*-*byte* or greater data structures be aligned on a *32* *byte* boundary. In that case, increasing the requested memory by 4 bytes and computing newp are superfluous. Chuck
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion