Dear all, I have written the following function to compute the square distances of a large matrix (each sample a row). It compute row by row and print the overall progress. The progress output is important and I didn't use matrix multiplication.
I give as input a 70,000x800 matrix. The output should be a 70,000x70,000 matrix. The program runs really slow (16 hours for 1/3 progress). And it eats 36G memory (fortunately I have enough). Could you give some insights on how to modify the code to be efficient and to eat less memory? thanks, Ke Sun def dist2_large( data ): import time if data.ndim != 2: raise RuntimeError( "data should be a matrix" ) N,D = data.shape print 'using the sample-wise implementation' print '%d samples, %d dimensions' % (N,D) start_t = time.time() d2 = np.zeros( [N,N] ) for i in range( N ): print "\r%5d/%d" % (i+1, N), for j in range( N ): d2[i,j] = ((data[i] - data[j])**2).sum() total_t = time.time() - start_t hours = (total_t / 3600) minutes = (total_t % 3600) / 60 print "\nfinished in %2dh%2dm" % (hours, minutes) return d2 _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion