Dear Paul,
Remember that for K=283300, you’ll have K*(K+1)/2 = 4.0e10 distance values. 
That is over 160 gigabytes in memory. The reason you have an OOM error is that 
this exceeds the RAM of your machine. So consider whether you really need a 
quadratic algorithm on such a large number of samples.
If so, you’ll have to carefully split the dataset in P=10 splits, and then call 
Distances.pairwise over all splits p and q for all p <= q <= P while 
progressively freeing memory. That will be P*(P+1)/2 = 55 iterations, each of 
them allocating 1.6 Gb. Store progressively each of these 55 gigabyte-sized 
matrices in your hard drive.

Do you really need all distance values ? If your final application is e.g. 
clustering, there are suboptimal large-scale algorithms that have a lower 
complexity.

Vincent.
Le 12 octobre 2015 à 20:20:33, Paul Analyst ([email protected]) a écrit:

Unfortunaty for big file no memory :)

julia> F1=load("F.jd","F")
283300x266 Array{Float64,2}:
julia> mapa = Distances.pairwise(Euclidean(), F1')
ERROR: OutOfMemoryError()
in pairwise at C:\Users\SAMSUNG2\.julia\v0.4\Distances\src\generic.jl:132

Do You have any hint for big sets ?

Paul

W dniu 2015-10-12 o 12:23, Vincent Lostanlen pisze:

Dear Paul,

For k=100 and your purpose, parallelization may not be the utmost performance 
bottleneck here. I advise you to use the Distances.jl package.
Since Julia stores contiguous memory in column-major order, you will first need 
to transpose the matrix D — or, better, to define it foremost as a n*k matrix 
instead of k*n.
Once you've ensured that, calling
mapa = Distances.pairwise(Euclidean(), D)

should give you at least a 100x speedup over the for loop you've written, so 
parallelization should no longer be necessary.
Vincent.

Le dimanche 11 octobre 2015 16:34:27 UTC+2, paul analyst a écrit :
Like here , what wrong ?
k=100
mapa=zeros(k,k)

julia> @parallel for i=1:k,j=1:k
       mapa[i,j]=sqrt(sum([D[i,:]-D[j,:]].^2))
       end
ERROR: syntax: invalid assignment location

Paul

Reply via email to