You could create a "phony" 2-dimensional array that computes the distances
on the fly… but you won't be able to pass this matrix to, e.g., BLAS.
immutable DistanceMatrix <: AbstractArray{Float64, 2}
locs::Array{Float64, 2} # a 2xN or 3xN matrix containing the location
coordinates
end
Base.size(A::DistanceMatrix) = (length(A.locs), length(A.locs))
Base.getindex(A::DistanceMatrix, i::Int, j::Int) = dist(A.locs[:, i], A.locs
[:, j]) # could be further optimized
(Untested)
On Friday, August 14, 2015 at 11:03:06 AM UTC-4, Stefan Karpinski wrote:
>
> On Friday, August 14, 2015, Charles Novaes de Santana <
> [email protected] <javascript:>> wrote:
>
>>
>> 1) to use only the subset of suitable habitats to build the matrix of
>> distances (and then to use sparse matrix as suggested by Stefan)
>>
>
> Distance matrices are not usually sparse – since the farthest apart pairs
> of points have large distances and are the most common and least
> interesting. However, you could store only distances for close points in a
> sparse matrix and use zero to represent the distance between pairs of
> points that are not close enough to be of interest. Either that or you
> could store 1/d instead of d and then closer points have higher weights and
> you can threshold 1/distance so that far apart points have zero entries.
>
>
>> 2) to use a machine with more memory and try to run my models using the
>> matrices with all the sites
>>
>
> This is probably the easiest thing to do since your data set is not of a
> truly unreasonable size, just largish. However, you may be much happier if
> you can make your problem smaller than O(n^2).
>
>
>> 3) to try another language/library that might work better with such big
>> amount of data (like python, or R).
>>
>
> This problem isn't going to be fundamentally different no matter what
> language you use: you have more data than fits in memory. Spilling memory
> to disk is going to be *much* slower than just recomputing distances –
> orders of magnitude slower. As John suggested, is there any particular
> reason you need to materialize all of these values in a matrix? What
> computation are you going to perform over that matrix?
>