Hey everyone, Congratulations on shipping 0.16! I think I found a problem which seems pretty straight forward. Say I want to factor out inner part of some N^2 loops over a flow array, I write something like
cdef inline float _inner(size_t i, size_t j, float[:] x): cdef float d = x[i] - x[j] return sqrtf(d * d) In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and function is declared as inline, which is great. However, the memoryview structure is passed by value: static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i, size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) { ... This seems to hinder compiler's (in my case, GCC 4.3.4) ability to perform efficient inlining (although function does in fact get inlined). If I manually inline that distance calculation, I get 3x speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k elements). When I manually modified generated .c file to pass memory view slice by pointer, slowdown was eliminated completely. On a somewhat relevant node, have you considered enabling Issues page on Github? Thanks! Dimitri. _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel