On Wed, May 2, 2012 at 4:46 PM, Kevin Jacobs <jac...@bioinformed.com> <bioinfor...@gmail.com> wrote:
> The cKDTree implementation is more than 4 times faster than the brute-force > approach: > > T = scipy.spatial.cKDTree(targets) > > In [11]: %timeit foo1(element, targets) # Brute force > 1000 loops, best of 3: 385 us per loop > > In [12]: %timeit foo2(element, T) # cKDTree > 10000 loops, best of 3: 83.5 us per loop > > In [13]: 385/83.5 > Out[13]: 4.610778443113772 Brute force plus cython beats cKDTree for knn with k=1 and Euclidean distance: I[5] targets = np.random.uniform(0, 8, (5000, 7)) I[6] element = np.arange(1, 8, dtype=np.float64) I[7] T = scipy.spatial.cKDTree(targets) I[8] timeit T.query(element) 10000 loops, best of 3: 36.1 us per loop I[9] timeit nn(targets, element) 10000 loops, best of 3: 28.5 us per loop What about lower dimensions (2 instead of 7) where cKDTree gets faster? I[18] element = np.arange(1,3, dtype=np.float64) I[19] targets = np.random.uniform(0,8,(5000,2)) I[20] T = scipy.spatial.cKDTree(targets) I[21] timeit T.query(element) 10000 loops, best of 3: 27.5 us per loop I[22] timeit nn(targets, element) 100000 loops, best of 3: 11.6 us per loop I could add nn to the bottleneck package. Is the k=1, Euclidean distance case too specialized? Prototype (messy) code is here: https://github.com/kwgoodman/bottleneck/issues/45 For a smaller speed up (~2x) foo1 could use bottleneck's sum or squares function, bn.ss(). _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion