On Friday, August 14, 2015, Charles Novaes de Santana <
[email protected]> wrote:

>
> 1) to use only the subset of suitable habitats to build the matrix of
> distances (and then to use sparse matrix as suggested by Stefan)
>

Distance matrices are not usually sparse – since the farthest apart pairs
of points have large distances and are the most common and least
interesting. However, you could store only distances for close points in a
sparse matrix and use zero to represent the distance between pairs of
points that are not close enough to be of interest. Either that or you
could store 1/d instead of d and then closer points have higher weights and
you can threshold 1/distance so that far apart points have zero entries.


> 2) to use a machine with more memory and try to run my models using the
> matrices with all the sites
>

This is probably the easiest thing to do since your data set is not of a
truly unreasonable size, just largish. However, you may be much happier if
you can make your problem smaller than O(n^2).


> 3) to try another language/library that might work better with such big
> amount of data (like python, or R).
>

This problem isn't going to be fundamentally different no matter what
language you use: you have more data than fits in memory. Spilling memory
to disk is going to be *much* slower than just recomputing distances –
orders of magnitude slower. As John suggested, is there any particular
reason you need to materialize all of these values in a matrix? What
computation are you going to perform over that matrix?

Reply via email to