[R] how to calculate the consistency of different clusterings

Mao Jianfeng Thu, 13 Jan 2011 07:37:18 -0800

Dear R-listers,

I do clustering on tens of individuals by thousands of traits. I have
known the assignment of each individual. I want to classify the
individuals by randomly resampling different subsets of the traits,
for example, randomly resampling 100 traits for 100 times, then 200
traits for 100 times, then 300 traits for 100 times, ,,,,,,. By each
subset of traits, I do clustering of the same individuals.


In the end, I want to get the consistency (in percentage) of each of
these clusterings (as examples, here "cluster.1", "cluster.2" and
"cluster.3" in the dummy data) with the assignment which is already
known ("populations" in the dummy data). I want to know how such work
can be implemented, maybe by using R.

#dummy data,

clus.data <- data.frame(individual = paste("ind", 1:12, sep = ""),
populations = c(rep("popA", 5), rep("popB", 7)), cluster.1 = c(rep(1,
5), rep(2, 7)), cluster.2 = c(rep(2, 4), rep(1, 8)), cluster.3 =
c(rep(4, 7), rep(5, 5)))

clus.data

Thanks.


-- 
Jian-Feng, Mao

the Institute of Botany,
Chinese Academy of Botany,

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to calculate the consistency of different clusterings

Reply via email to