> A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué: > Good point. I was not aware of this subtlity. In fact, numexpr does > not get well with transposed views of NumPy arrays. Filed the bug in: > > http://code.google.com/p/numexpr/issues/detail?id=18 > >> Not sure whether it is possible with other kinds of views >> (where e.g. a middle dimension varies fastest), but the NumPy model >> doesn't preclude it and I suppose it would be possible with >> stride_tricks. > > Middle dimensions varying first? Oh my! :)
I cannot see any obvious justification for letting the middle dimension varying first. C ordering is natural if we work with a "pointer to an array of pointers" or an "array of arrays", which in both cases will be indexed as array[i][j] in C: array[i][j] = (array[i])[j] = *(array[i]+j) = *(*(array+i)+j) While C has arrays and pointers, the difference is almost never visible to the programmer. This has lead some to erroneously believe that "C has no arrays, only pointers". However: double array[512]; double *p = array; Now sizeof(array) will be sizeof(double)*512, whereas sizeof(p) will be sizeof(long). This is one of very few cases where C arrays and pointers behave differently, but demonstrates the existence of arrays in C. The justification for Fortran ordering is in the mathematics. Say we have a set of linear equations A * X = B and are going to solve for X, using some magical subroutine 'solve'. The most efficient way to store these arrays becomes the Fortran ordering. That is, call solve(A, X, B) will be mathematically equivalent to the loop do i = i, n call solve(A, X(:,i), B) end do All the arrays in the call to solve are still kept contigous! This would not be the case with C ordering, which is an important reason that C sucks so much for numerical computing. To write efficient linear algebra in C, we have to store matrices in memory transposed to how they appear in mathematical equations. In fact, Matlab uses Fortran ordering because of this. While C ordering feels natural to computer scientists, who loves the beauty of pointer and array symmetries, it is a major obstacle for scientists and engineers from other fields. It is perhaps the most important reason why numerical code written in Fortran tend to be the faster: If a matrix is rank n x m in the equation, it should be rank n x m in the program as well, right? Not so with C. The better performance of Fortran for numerical code is often blamed on pointer aliasing in C. I believe wrong memory layout by the hands of the programmer is actually the more important reason. In fact, whenever I have done comparisons, the C compiler has always produced the faster machine code (gcc vs. gfortran or g77, icc vs. ifort). But to avoid the pitfall, one must be aware of it. And when a programmer's specialization is in another field, this is usually not the case. Most scientists doing some casual C, Java or Python programming fall straight into the trap. That is also why I personally feel it was a bad choice to let C ordering be the default in NumPy. S.M. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion