Do not use html in r-help emails. Look below at what happens to your data. The error message is telling you that t(data) is not numeric.
> str(data) That will tell you what kind of data you have. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of marco milella > Sent: Thursday, December 06, 2012 12:08 PM > To: r-help@r-project.org > Subject: [R] clustering of binary data > > Good morning, > I am analyzing a dataset composed by 364 subjects and 13 binary > variables > (0,1 = absence,presence). > I am testing possible association (co-presence) of my variables. To do > this, I was trying with cluster analysis. > > My main interest is to check for the significance of the obtained > clusters. > > First, I tried with the pvclust() function, by using > method.hclust="ward" > and method.dist="binary". Altoghether it works (clusters and > significance > obtained). However, I'm not convinced by the distance matrix. > Association > between variables are indeed different from results obtained in PAST by > using Ward on a Jaccard matrix (that should be ok for binary data). > Moreover, when I try to obtain a Jaccard matrix in R from my data, by > using > the Vegan package > > mydistance<-vegdist(t(data),method="jaccard") > > I receive the following error message: > > Error in rowSums(x, na.rm = TRUE) : 'x' must be numeric > > > below an subset from my dataset: > > variable1 variable2 variable3 variable4 variable5 variable6 > variable7 > variable8 variable9 variable10 variable11 variable12 variable13 case1 > 0 0 0 > 0 0 1 0 0 1 1 0 0 0 case2 0 0 0 0 0 1 0 NA NA 1 0 0 0 case3 0 0 0 0 0 > 1 0 > 0 1 1 0 0 0 case4 1 0 0 0 0 1 0 1 0 1 0 0 0 case5 0 0 0 0 0 1 0 0 1 1 > 0 0 > 0 case6 0 1 0 0 0 1 0 1 0 1 0 0 0 case7 0 1 0 0 0 1 0 0 1 1 0 0 0 > case8 0 > 0 0 0 0 1 0 1 0 1 0 0 0 case9 0 0 0 0 0 1 0 1 0 1 0 0 0 case10 0 0 0 > 0 0 1 > 0 0 1 1 0 0 0 case11 1 0 0 1 0 1 1 1 0 1 0 0 0 case12 0 0 0 1 1 0 1 1 > 0 1 > 0 0 0 ..... > > > > > > > > > > > > > > So, my questions are the following: Is the Jaccard index a good > strategy > for my kind of data? Is binary distance used in pvclust is > theoretically > more correct? Is there any alternative to pvclust for testing the > significance of my clusters? > > Thanks in advance > Marco > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.