On 03/12/2015 10:15 AM, Gregor Thalhammer wrote:
> 
> Another note, numpy makes it easy to provide new ufuncs, see 
> http://docs.scipy.org/doc/numpy-dev/user/c-info.ufunc-tutorial.html
> from a C function that operates on 1D arrays, but this function needs to
> support arbitrary spacing (stride) between the items. Unfortunately, to
> achieve good performance, vector math libraries often expect that the
> items are laid out contiguously in memory. MKL/VML is a notable
> exception. So for non contiguous in- or output arrays you might need to
> copy the data to a buffer, which likely kills large amounts of the
> performance gain.

The elementary functions are very slow even compared to memory access,
they take in the orders of hundreds to tens of thousand cycles to
complete (depending on range and required accuracy).
Even in the case of strided access that gives the hardware prefetchers
plenty of time to load the data before the previous computation is done.

This also removes the requirement from the library to provide a strided
api, we can copy the strided data into a contiguous buffer and pass it
to the library without losing much performance. It may not be optimal
(e.g. a library can fine tune the prefetching better for the case where
the hardware is not ideal) but most likely sufficient.

Figuring out how to best do it to get the best performance and still
being flexible in what implementation is used is part of the challenge
the student will face for this project.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to