Hi, On Tue, Oct 8, 2013 at 12:44 PM, Matthew Brett <matthew.br...@gmail.com> wrote: > Hi, > > On Tue, Oct 8, 2013 at 4:38 AM, Ke Sun <sunk...@gmail.com> wrote: >> On Tue, Oct 08, 2013 at 01:49:14AM -0700, Matthew Brett wrote: >>> Hi, >>> >>> On Tue, Oct 8, 2013 at 1:06 AM, Ke Sun <sunk...@gmail.com> wrote: >>> > Dear all, >>> > >>> > I have written the following function to compute the square distances of >>> > a large >>> > matrix (each sample a row). It compute row by row and print the overall >>> > progress. >>> > The progress output is important and I didn't use matrix multiplication. >>> > >>> > I give as input a 70,000x800 matrix. The output should be a 70,000x70,000 >>> > matrix. The program runs really slow (16 hours for 1/3 progress). And it >>> > eats >>> > 36G memory (fortunately I have enough). >>> >>> That is very slow. >>> >>> As a matter of interest - why didn't you use matrix multiplication? >> Because it will cost hours and I want to see the progress and >> know how far it goes. Another concern is to save memory and >> compute one sample at a time. >> >>> On a machine I had access to it took about 20 minutes. >> How? I am using matrix multiplication (the same code as >> http://stackoverflow.com/a/4856692) and it runs for around 18 hours. > > I wonder if you are running into disk swap - the code there does > involve a large temporary array. > > I believe the appended version of the code is correct, and I think it > is also memory efficient. > > On a fast machine with lots of memory, it ran in about 5 minutes. > It's using EPD, which might be using multiple cores for the matrix > multiply. > > Does the code also work for you in reasonable time? > > Another suggestion I saw which only calculates the unique values (say > lower diagonal) is scipy.spatial.distance > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html#scipy.spatial.distance.pdist > > To a first pass that seems to be slower than the matrix multiply.
On a machine that should have been about as fast: import scipy.spatial.distance as ssd dists = ssd.pdist(A, 'sqeuclidean') on the same size matrix took about 40 minutes. Cheers, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion