Thx, huge help:)
unfortunatly i must compute all ;/ bur only one time ;)
Paul
W dniu 2015-10-12 o 20:52, Vincent Lostanlen pisze:
Dear Paul,
Remember that for K=283300, you’ll have K*(K+1)/2 = 4.0e10 distance
values.
That is over 160 gigabytes in memory. The reason you have an OOM error
is that this exceeds the RAM of your machine. So consider whether you
really need a quadratic algorithm on such a large number of samples.
If so, you’ll have to carefully split the dataset in P=10 splits, and
then call Distances.pairwise over all splits p and q for all p <= q <=
P while progressively freeing memory. That will be P*(P+1)/2 = 55
iterations, each of them allocating 1.6 Gb. Store progressively each
of these 55 gigabyte-sized matrices in your hard drive.
Do you really need all distance values ? If your final application is
e.g. clustering, there are suboptimal large-scale algorithms that have
a lower complexity.
Vincent.
Le 12 octobre 2015 à 20:20:33, Paul Analyst ([email protected]
<mailto:[email protected]>) a écrit:
Unfortunaty for big file no memory :)
julia> F1=load("F.jd","F")
283300x266 Array{Float64,2}:
julia> mapa = Distances.pairwise(Euclidean(), F1')
ERROR: OutOfMemoryError()
in pairwise at C:\Users\SAMSUNG2\.julia\v0.4\Distances\src\generic.jl:132
Do You have any hint for big sets ?
Paul
W dniu 2015-10-12 o 12:23, Vincent Lostanlen pisze:
Dear Paul,
For k=100 and your purpose, parallelization may not be the utmost
performance bottleneck here. I advise you to use the Distances.jl
<https://github.com/JuliaStats/Distances.jl> package.
Since Julia stores contiguous memory in column-major order
<https://julia.readthedocs.org/en/latest/manual/performance-tips/#access-arrays-in-memory-order-along-columns>,
you will first need to transpose the matrix D — or, better, to
define it foremost as a n*k matrix instead of k*n.
Once you've ensured that, calling
mapa = Distances.pairwise(Euclidean(), D)
should give you at least a 100x speedup over the for loop you've
written, so parallelization should no longer be necessary.
Vincent.
Le dimanche 11 octobre 2015 16:34:27 UTC+2, paul analyst a écrit :
Like here , what wrong ?
k=100
mapa=zeros(k,k)
julia> @parallel for i=1:k,j=1:k
mapa[i,j]=sqrt(sum([D[i,:]-D[j,:]].^2))
end
ERROR: syntax: invalid assignment location
Paul