Hi, Running on: In []: np.__version__ Out[]: '1.5.1' In []: sys.version Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'
For the reference: In []: X= randn(10, 125) In []: timeit dot(X.T, X) 10000 loops, best of 3: 170 us per loop In []: X= randn(10, 250) In []: timeit dot(X.T, X) 1000 loops, best of 3: 671 us per loop In []: X= randn(10, 500) In []: timeit dot(X.T, X) 100 loops, best of 3: 5.15 ms per loop In []: X= randn(10, 1000) In []: timeit dot(X.T, X) 100 loops, best of 3: 20 ms per loop In []: X= randn(10, 2000) In []: timeit dot(X.T, X) 10 loops, best of 3: 80.7 ms per loop Performance of triu_indices: In []: timeit triu_indices(125) 1000 loops, best of 3: 662 us per loop In []: timeit triu_indices(250) 100 loops, best of 3: 2.55 ms per loop In []: timeit triu_indices(500) 100 loops, best of 3: 15 ms per loop In []: timeit triu_indices(1000) 10 loops, best of 3: 59.8 ms per loop In []: timeit triu_indices(2000) 1 loops, best of 3: 239 ms per loop So the tri*_indices calculations seems to be unreasonable slow compared to for example calculations of inner products. Now, just to compare for a very naive implementation of triu indices. In []: def iut(n): ..: r= np.empty(n* (n+ 1)/ 2, dtype= int) ..: c= r.copy() ..: a= np.arange(n) ..: m= 0 ..: for i in xrange(n): ..: ni= n- i ..: mni= m+ ni ..: r[m: mni]= i ..: c[m: mni]= a[i: n] ..: m+= ni ..: return (r, c) ..: Are we really calculating the same thing? In []: triu_indices(5) Out[]: (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) In []: iut(5) Out[]: (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4])) Seems so, and then its performance: In []: timeit iut(125) 1000 loops, best of 3: 992 us per loop In []: timeit iut(250) 100 loops, best of 3: 2.03 ms per loop In []: timeit iut(500) 100 loops, best of 3: 5.3 ms per loop In []: timeit iut(1000) 100 loops, best of 3: 13.9 ms per loop In []: timeit iut(2000) 10 loops, best of 3: 39.8 ms per loop Even the naive implementation is very slow, but allready outperforms triu_indices, when n is > 250! So finally my question is how one could substantially improve the performance of indices calculations? Regards, eat _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion