Re: [Numpy-discussion] Distance Matrix speed

2006-06-20 Thread Alan G Isaac
I think the distance matrix version below is about as good as it gets with these basic strategies. fwiw, Alan Isaac def dist(A,B): rowsA, rowsB = A.shape[0], B.shape[0] distanceAB = empty( [rowsA,rowsB] , dtype=float) if rowsA = rowsB: temp = empty_like(B) for i in

Re: [Numpy-discussion] Distance Matrix speed

2006-06-19 Thread Sebastian Beca
I just ran Alan's script and I don't get consistent results for 100 repetitions. I boosted it to 1000, and ran it several times. The faster one varied alot, but both came into a ~ +-1.5% difference. When it comes to scaling, for my problem(fuzzy clustering), N is the size of the dataset, which

Re: [Numpy-discussion] Distance Matrix speed

2006-06-18 Thread Sebastian Beca
I checked the matlab version's code and it does the same as discussed here. The only thing to check is to make sure you loop around the shorter dimension of the output array. Speedwise the Matlab code still runs about twice as fast for large sets of data (by just taking time by hand and

Re: [Numpy-discussion] Distance Matrix speed

2006-06-18 Thread Alan G Isaac
On Sun, 18 Jun 2006, Sebastian Beca apparently wrote: def dist(): d = zeros([N, C], dtype=float) if N C: for i in range(N): xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) return d else: for j in range(C): xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1)) return d But that is 50%

Re: [Numpy-discussion] Distance Matrix speed

2006-06-18 Thread Tim Hochberg
Alan G Isaac wrote: On Sun, 18 Jun 2006, Sebastian Beca apparently wrote: def dist(): d = zeros([N, C], dtype=float) if N C: for i in range(N): xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) return d else: for j in range(C): xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1)) return d

Re: [Numpy-discussion] Distance Matrix speed

2006-06-18 Thread Alan G Isaac
On Sun, 18 Jun 2006, Tim Hochberg apparently wrote: Alan G Isaac wrote: On Sun, 18 Jun 2006, Sebastian Beca apparently wrote: def dist(): d = zeros([N, C], dtype=float) if N C: for i in range(N): xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) return d else: for j in

Re: [Numpy-discussion] Distance Matrix speed

2006-06-17 Thread Johannes Loehnert
Hi, def d4(): d = zeros([4, 1000], dtype=float) for i in range(4): xy = A[i] - B d[i] = sqrt( sum(xy**2, axis=1) ) return d Maybe there's another alternative to d4? Thanks again, I think this is the fastest you can get. Maybe it would be nicer to use the

Re: [Numpy-discussion] Distance Matrix speed

2006-06-17 Thread Alex Cannon
How about this? def d5(): return add.outer(sum(A*A, axis=1), sum(B*B, axis=1)) - \ 2.*dot(A, transpose(B)) ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net

Re: [Numpy-discussion] Distance Matrix speed

2006-06-17 Thread Robert Kern
Alex Cannon wrote: How about this? def d5(): return add.outer(sum(A*A, axis=1), sum(B*B, axis=1)) - \ 2.*dot(A, transpose(B)) You might lose some precision with that approach, so the OP should compare results and timings to look at the tradeoffs. -- Robert

[Numpy-discussion] distance matrix speed

2006-06-16 Thread Sebastian Beca
Hi, I'm working with NumPy/SciPy on some algorithms and i've run into some important speed differences wrt Matlab 7. I've narrowed the main speed problem down to the operation of finding the euclidean distance between two matrices that share one dimension rank (dist in Matlab): Python: def

Re: [Numpy-discussion] distance matrix speed

2006-06-16 Thread Michael Sorich
Hi Sebastian, I am not sure if there is a function already defined in numpy, but something like this may be what you are after def distance(a1, a2): return sqrt(sum((a1[:,newaxis,:] - a2[newaxis,:,:])**2, axis=2)) The general idea is to avoid loops if you want the code to execute fast. I

Re: [Numpy-discussion] distance matrix speed

2006-06-16 Thread Johannes Loehnert
Hi, def dtest():     A = random( [4,2])     B = random( [1000,2]) # drawback: memory usage temporarily doubled # solution see below d = A[:, newaxis, :] - B[newaxis, :, :] # written as 3 expressions for more clarity d = sqrt((d**2).sum(axis=2)) return d def

Re: [Numpy-discussion] distance matrix speed

2006-06-16 Thread Tim Hochberg
Sebastian Beca wrote: Hi, I'm working with NumPy/SciPy on some algorithms and i've run into some important speed differences wrt Matlab 7. I've narrowed the main speed problem down to the operation of finding the euclidean distance between two matrices that share one dimension rank (dist in

Re: [Numpy-discussion] Distance Matrix speed

2006-06-16 Thread Tim Hochberg
Christopher Barker wrote: Bruce Southey wrote: Please run the exact same code in Matlab that you are running in NumPy. Many of Matlab functions are very highly optimized so these are provided as binary functions. I think that you are running into this so you are not doing the correct

Re: [Numpy-discussion] Distance Matrix speed

2006-06-16 Thread Sebastian Beca
Thanks! Avoiding the inner loop is MUCH faster (~20-300 times than the original). Nevertheless I don't think I can use hypot as it only works for two dimensions. The general problem I have is: A = random( [C, K] ) B = random( [N, K] ) C ~ 1-10 N ~ Large (thousands, millions.. i.e. my dataset) K

Re: [Numpy-discussion] Distance Matrix speed

2006-06-16 Thread Sebastian Beca
Please replace: C = 4 N = 1000 d = zeros([C, N], dtype=float) BK. ___ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion