Hi,

Running on:
In []: np.__version__
Out[]: '1.5.1'
In []: sys.version
Out[]: '2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)]'

For the reference:
In []: X= randn(10, 125)
In []: timeit dot(X.T, X)
10000 loops, best of 3: 170 us per loop
In []: X= randn(10, 250)
In []: timeit dot(X.T, X)
1000 loops, best of 3: 671 us per loop
In []: X= randn(10, 500)
In []: timeit dot(X.T, X)
100 loops, best of 3: 5.15 ms per loop
In []: X= randn(10, 1000)
In []: timeit dot(X.T, X)
100 loops, best of 3: 20 ms per loop
In []: X= randn(10, 2000)
In []: timeit dot(X.T, X)
10 loops, best of 3: 80.7 ms per loop

Performance of triu_indices:
In []: timeit triu_indices(125)
1000 loops, best of 3: 662 us per loop
In []: timeit triu_indices(250)
100 loops, best of 3: 2.55 ms per loop
In []: timeit triu_indices(500)
100 loops, best of 3: 15 ms per loop
In []: timeit triu_indices(1000)
10 loops, best of 3: 59.8 ms per loop
In []: timeit triu_indices(2000)
1 loops, best of 3: 239 ms per loop

So the tri*_indices calculations seems to be unreasonable slow compared to for 
example calculations of inner products.

Now, just to compare for a very naive implementation of triu indices.
In []: def iut(n):
   ..:     r= np.empty(n* (n+ 1)/ 2, dtype= int)
   ..:     c= r.copy()
   ..:     a= np.arange(n)
   ..:     m= 0
   ..:     for i in xrange(n):
   ..:         ni= n- i
   ..:         mni= m+ ni
   ..:         r[m: mni]= i
   ..:         c[m: mni]= a[i: n]
   ..:         m+= ni
   ..:     return (r, c)
   ..:

Are we really calculating the same thing?
In []: triu_indices(5)
Out[]:
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]),
 array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4]))
In []: iut(5)
Out[]:
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]),
 array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4]))

Seems so, and then its performance:
In []: timeit iut(125)
1000 loops, best of 3: 992 us per loop
In []: timeit iut(250)
100 loops, best of 3: 2.03 ms per loop
In []: timeit iut(500)
100 loops, best of 3: 5.3 ms per loop
In []: timeit iut(1000)
100 loops, best of 3: 13.9 ms per loop
In []: timeit iut(2000)
10 loops, best of 3: 39.8 ms per loop

Even the naive implementation is very slow, but allready outperforms 
triu_indices, when n is > 250!

So finally my question is how one could substantially improve the performance 
of indices calculations?


Regards,
eat


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to