[R] Distance between sets of points in transformed environmental space

2009-12-01 Thread Corrado
Dear friends,

I have several sets of points in a transformed environmental space. Each set 
of points can be represented as a cloud in the environmental space.

This space is spanned by n coordinates, corresponding to the first n PCs of 36 
PCs of some environmental variables (12 monthly minimum temperatures, 12 
monthly maximum temperature, 12 monthly precipitations).

I would like to calculate a distance or dissimilarity between each pair of 
sets of points.

Let's label two of those sets as X,Y, where x is in X and y is in Y. We are 
interested in defining a distance between X and Y. I have thought of using the 
following:

1) The Euclidean distance between the centroids of X and Y. Simple and 
effective but does not give much real information on the actual degree of 
overlapping.
2) The median of the all the distances between all pairs of points (x,y). Same 
problem as (1), partially resolved.
3) The proportion of points of X U Y which fall outside the intersection of 
the convex or concave hulls (defined with a smoothing parameter) of X and Y, 
i.e. C(X) intersect C(Y). Very complicated, and does not necessarily lead to

What do you think? Are there any other approaches worth considering?  

Kind Regards
-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Distance between sets of points in transformed environmental space

2009-12-01 Thread Charlotte Maia
Well, here's another naive post from me (hopefully better than the last one).

Firstly I'm not sure computing euclidean distance is that simple. I
would assume temperatures and precipitation would need to be
standardised in some way.

I think the notion of how far away something is, and how distinct
location wise something is, are quite different, so maybe two
measures?

For distance per se, I think your first idea is the best.
Plus simple is always good...

For distinctness, given one one of two sets, for each point, you could
just compute the closest point to it. If the closest point is a member
of the same set, we will call that a + point, if the closest point is
a member of the other set, we will call it a - point. In principle the
measure of distinctness would be the sum of the +'s, however there
might need to be some scaling to take into account the number of
points in each set.

There are also a lot of fancy things out there, so someone will
probably come up with a much fancier (and possibly better) idea than
this.

Well, that's just my rant, before I go to bed.


kind regards
-- 
Charlotte Maia
http://sites.google.com/site/maiagx/home

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Distance between sets of points in transformed environmental space

2009-12-01 Thread Mario Valle
silhouette coefficients?
It measure for each point how similar is to its cluster other points and how 
dissimilar
from the points of other clusters.

P.N. Tam, M. Steinbach, V. Kumar, Introduction to data mining, Addison-Wesley, 
2006 page 541

Hope it helps.
mario

Charlotte Maia wrote:
 Well, here's another naive post from me (hopefully better than the last one).
 
 Firstly I'm not sure computing euclidean distance is that simple. I
 would assume temperatures and precipitation would need to be
 standardised in some way.
 
 I think the notion of how far away something is, and how distinct
 location wise something is, are quite different, so maybe two
 measures?
 
 For distance per se, I think your first idea is the best.
 Plus simple is always good...
 
 For distinctness, given one one of two sets, for each point, you could
 just compute the closest point to it. If the closest point is a member
 of the same set, we will call that a + point, if the closest point is
 a member of the other set, we will call it a - point. In principle the
 measure of distinctness would be the sum of the +'s, however there
 might need to be some scaling to take into account the number of
 points in each set.
 
 There are also a lot of fancy things out there, so someone will
 probably come up with a much fancier (and possibly better) idea than
 this.
 
 Well, that's just my rant, before I go to bed.
 
 
 kind regards

-- 
Ing. Mario Valle
Data Analysis and Visualization Group| http://www.cscs.ch/~mvalle
Swiss National Supercomputing Centre (CSCS)  | Tel:  +41 (91) 610.82.60
v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax:  +41 (91) 610.82.82

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Distance between sets of points in transformed environmental space

2009-12-01 Thread Corrado
Thanks Mario! (Oppure grazie Mario?)

- Can those silhouette coefficients be used for distances between sets or only 
for distances point to set?

- Where did you get the other post you attached? It did not come up when I 
searched the mailing list!
 

Best,

On Tuesday 01 December 2009 10:31:47 Mario Valle wrote:
 silhouette coefficients?
 It measure for each point how similar is to its cluster other points and
  how dissimilar from the points of other clusters.
 
 P.N. Tam, M. Steinbach, V. Kumar, Introduction to data mining,
  Addison-Wesley, 2006 page 541
 
 Hope it helps.
   mario
 
 Charlotte Maia wrote:
  Well, here's another naive post from me (hopefully better than the last
  one).
 
  Firstly I'm not sure computing euclidean distance is that simple. I
  would assume temperatures and precipitation would need to be
  standardised in some way.
 
  I think the notion of how far away something is, and how distinct
  location wise something is, are quite different, so maybe two
  measures?
 
  For distance per se, I think your first idea is the best.
  Plus simple is always good...
 
  For distinctness, given one one of two sets, for each point, you could
  just compute the closest point to it. If the closest point is a member
  of the same set, we will call that a + point, if the closest point is
  a member of the other set, we will call it a - point. In principle the
  measure of distinctness would be the sum of the +'s, however there
  might need to be some scaling to take into account the number of
  points in each set.
 
  There are also a lot of fancy things out there, so someone will
  probably come up with a much fancier (and possibly better) idea than
  this.
 
  Well, that's just my rant, before I go to bed.
 
 
  kind regards
 





-- 
Corrado Topi

Global Climate Change  Biodiversity Indicators
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Distance between sets of points in transformed environmental space

2009-12-01 Thread Charlotte Maia
Hi Corrado,

I was thinking about this some more.

Maybe you could use a linear discriminate, i.e. a (hyper)plane that
partitions your points into two sets, such that the misclassification
rate is minimised.

Closeness could be regarded as the number of misclassified points.
Two sets would be distant, if no points are misclassified.

I am assuming there is a standard function in R to do this, no idea
what it is though. Plus this is a reasonably well known technique.

Again the size of the sets needs to be accounted for.
As well as the question, does the distance of set A from B, need to be
the same as the distance of set B from A. Both the nearest neighbour
approach and the discriminant approach, don't necessarily satisfy this
condition.

regards
-- 
Charlotte Maia
http://sites.google.com/site/maiagx/home

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.