2016-05-05 22:10 GMT+02:00 Øystein Schønning-Johansen <oyste...@gmail.com>:
> Thanks for your answer, Francesc. Knowing that there is no numpy solution > saves the work of searching for this. I've not tried the solution described > at SO, but it looks like a real performance killer. I'll rather try to > override malloc with glibs malloc_hooks or LD_PRELOAD tricks. Do you think > that will do it? I'll try it and report back. > I don't think you need that much weaponry. Just create an array with some spare space for alignment. Realize that you want a 64-byte aligned double precision array. With that, create your desired array + 64 additional bytes (8 doubles): In [92]: a = np.zeros(int(1e6) + 8) In [93]: a.ctypes.data % 64 Out[93]: 16 and compute the elements to shift this: In [94]: shift = (64 / a.itemsize) - (a.ctypes.data % 64) / a.itemsize In [95]: shift Out[95]: 6 now, create a view with the required elements less: In [98]: b = a[shift:-((64 / a.itemsize)-shift)] In [99]: len(b) Out[99]: 1000000 In [100]: b.ctypes.data % 64 Out[100]: 0 and voila, b is now aligned to 64 bytes. As the view is a copy-free operation, this is fast, and you only wasted 64 bytes. Pretty cheap indeed. Francesc > > Thanks, > -Øystein > > On Thu, May 5, 2016 at 1:55 PM, Francesc Alted <fal...@gmail.com> wrote: > >> 2016-05-05 11:38 GMT+02:00 Øystein Schønning-Johansen <oyste...@gmail.com >> >: >> >>> Hi! >>> >>> I've written a little code of numpy code that does a neural network >>> feedforward calculation: >>> >>> def feedforward(self,x): >>> for activation, w, b in zip( self.activations, self.weights, >>> self.biases ): >>> x = activation( np.dot(w, x) + b) >>> >>> This works fine when my activation functions are in Python, however I've >>> wrapped the activation functions from a C implementation that requires the >>> array to be memory aligned. (due to simd instructions in the C >>> implementation.) So I need the operation np.dot( w, x) + b to return a >>> ndarray where the data pointer is aligned. How can I do that? Is it >>> possible at all? >>> >> >> Yes. np.dot() does accept an `out` parameter where you can pass your >> aligned array. The way for testing if numpy is returning you an aligned >> array is easy: >> >> In [15]: x = np.arange(6).reshape(2,3) >> >> In [16]: x.ctypes.data % 16 >> Out[16]: 0 >> >> but: >> >> In [17]: x.ctypes.data % 32 >> Out[17]: 16 >> >> so, in this case NumPy returned a 16-byte aligned array which should be >> enough for 128 bit SIMD (SSE family). This kind of alignment is pretty >> common in modern computers. If you need 256 bit (32-byte) alignment then >> you will need to build your container manually. See here for an example: >> http://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrrays >> >> Francesc >> >> >>> >>> (BTW: the function works correctly about 20% of the time I run it, and >>> else it segfaults on the simd instruction in the the C function) >>> >>> Thanks, >>> -Øystein >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> Francesc Alted >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Francesc Alted
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion