Dear R-listers, I do clustering on tens of individuals by thousands of traits. I have known the assignment of each individual. I want to classify the individuals by randomly resampling different subsets of the traits, for example, randomly resampling 100 traits for 100 times, then 200 traits for 100 times, then 300 traits for 100 times, ,,,,,,. By each subset of traits, I do clustering of the same individuals.
In the end, I want to get the consistency (in percentage) of each of these clusterings (as examples, here "cluster.1", "cluster.2" and "cluster.3" in the dummy data) with the assignment which is already known ("populations" in the dummy data). I want to know how such work can be implemented, maybe by using R. #dummy data, clus.data <- data.frame(individual = paste("ind", 1:12, sep = ""), populations = c(rep("popA", 5), rep("popB", 7)), cluster.1 = c(rep(1, 5), rep(2, 7)), cluster.2 = c(rep(2, 4), rep(1, 8)), cluster.3 = c(rep(4, 7), rep(5, 5))) clus.data Thanks. -- Jian-Feng, Mao the Institute of Botany, Chinese Academy of Botany, ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.