Hans Meine wrote: > Hi! > > I wonder why simple elementwise operations like "a * 2" or "a + 1" are not > performed in order of increasing memory addresses in order to exploit CPU > caches etc. C-order is "special" in NumPy due to the history. I agree that it doesn't need to be and we have taken significant steps in that direction. Right now, the fundamental problem is probably due to the fact that the output arrays are being created as C-order arrays when the input is a Fortran order array. Once that is changed then we can cause Fortran-order inputs to use the simplest path through the code.
Fortran order arrays can be preserved but it takes a little extra work because backward compatible expectations had to be met. See for example the order argument to the copy method of arrays. > - as it is now, their speed drops by a factor of around 3 simply > by transpose()ing. Similarly (but even less logical), copy() and even the > constructor are affected (yes, I understand that copy() creates contiguous > arrays, but shouldn't it respect/retain the order nevertheless?): > As mentioned, it can preserve order with the 'order' argument a.copy('A') > ### constructor ### > In [89]: %timeit -r 10 -n 1000000 numpy.ndarray((3,3,3)) > 1000000 loops, best of 10: 1.19 s per loop > > In [90]: %timeit -r 10 -n 1000000 numpy.ndarray((3,3,3), order="f") > 1000000 loops, best of 10: 2.19 s per loop > > I bet what you are seeing here is simply the overhead of processing the order argument. Try the first one with order='c' to see what I mean. > ### copy 3x3x3 array ### > In [85]: a = numpy.ndarray((3,3,3)) > > In [86]: %timeit -r 10 a.copy() > 1000000 loops, best of 10: 1.14 s per loop > > In [87]: a = numpy.ndarray((3,3,3), order="f") > > In [88]: %timeit -r 10 -n 1000000 a.copy() > 1000000 loops, best of 10: 3.39 s per loop > Use the 'a' argument to allow copying in "fortran" order. > ### copy 256x256x256 array ### > In [74]: a = numpy.ndarray((256,256,256)) > > In [75]: %timeit -r 10 a.copy() > 10 loops, best of 10: 119 ms per loop > > In [76]: a = numpy.ndarray((256,256,256), order="f") > > In [77]: %timeit -r 10 a.copy() > 10 loops, best of 10: 274 ms per loop > Same thing here. Nobody is opposed to having faster code as long as we don't in the process break code bases. There is also the matter of implementation.... -Travis O. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion