I think you want something like this: x1 = x1 * weights[np.newaxis,:] x2 = x2 * weights[np.newaxis,:]
x1 = x1[np.newaxis, :, :] x2 = x2[:, np.newaxis, :] distance = np.sqrt( ((x1 - x2)**2).sum(axis=-1) ) x1 and x2 are arrays with size of (npoints, ndimensions), and npoints can be different for each array. I'm not sure I did your weights right, but that part shouldn't be so difficult. On May 21, 2008, at 2:39 PM, Emanuele Olivetti wrote: > Dear all, > > I need to speed up this function (a little example follows): > ------ > import numpy as N > def distance_matrix(data1,data2,weights): > rows = data1.shape[0] > columns = data2.shape[0] > dm = N.zeros((rows,columns)) > for i in range(rows): > for j in range(columns): > dm[i,j] = ((data1[i,:]-data2[j,:])**2*weights).sum() > pass > pass > return dm > > size1 = 4 > size2 = 3 > dimensions = 2 > data1 = N.random.rand(size1,dimensions) > data2 = N.random.rand(size2,dimensions) > weights = N.random.rand(dimensions) > dm = distance_matrix(data1,data2,weights) > print dm > ------------------ > The distance_matrix function computes the weighted (squared) euclidean > distances between each pair of vectors from two sets (data1, data2). > The previous naive algorithm is extremely slow for my standard use, > i.e., when size1 and size2 are in the order of 1000 or more. It can be > improved using N.subtract.outer: > > def distance_matrix_faster(data1,data2,weights): > rows = data1.shape[0] > columns = data2.shape[0] > dm = N.zeros((rows,columns)) > for i in range(data1.shape[1]): > dm += N.subtract.outer(data1[:,i],data2[:,i])**2*weights[i] > pass > return dm > > This algorithm becomes slow when dimensions (i.e., data1.shape[1]) is > big (i.e., >1000), due to the Python loop. In order to speed it up, > I guess > that N.subtract.outer could be used on the full matrices instead of > one > column at a time. But then there is a memory issue: 'outer' allocates > too much memory since it stores all possible combinations along all > dimensions. This is clearly unnecessary. > > Is there a NumPy way to avoid all Python loops and without wasting > too much memory? As a comparison I coded the same algorithm in > C through weave (inline): it is _much_ faster and requires just > the memory to store the result. But I'd prefer not using C or weave > if possible. > > Thanks in advance for any help, > > > Emanuele > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion ---- Rob Hetland, Associate Professor Dept. of Oceanography, Texas A&M University http://pong.tamu.edu/~rob phone: 979-458-0096, fax: 979-845-6331 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion