Re: [R] Linear discriminant analysis
On 12.10.2023 16:25, Fernando Archuby wrote: Hi. I have successfully performed the discriminant analysis with the lda function, I can classify new individuals with the predict function, but I cannot figure out how the lda results translate into the classification decision. That is, I don't realize how the classification equation for new individuals is constructed from the lda output. I want to understand it but also, I need to communicate it and provide a mechanism for other colleagues to make classifications with their data. Thank you very much, Fernando Do you want to know the principles of the theory behind LDA? That is available in lots of textbooks. Do you want the implementation detials of MASS::lda()? That is hard. It is based (but does not follow in all details) on a paper by Nils Hjort from Norway. A former student of mine, Swetlana Herbrandt, has analysed and reverse engineered the code and wrote down the theory in a German thesis. The implementation uses some nice tricks to get numerically rather stable results that are typically not mentioned in any textbook. Do you really want to do prediction with LDA? I typically look at classificatuion performance of LDA as a reference to compare better and more modern techniques with. I think you should ask some trained local statistician for advise on both, the LDA theory and for prediction in general. Best, Uwe Ligges __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear discriminant analysis
It's possible that neither of these will help, but (1) you can look at the source code of the predict method (MASS:::predict.lda) (2) you can look at the source reference ("Modern Applied Statistics in S", Venables and Ripley) to see if it gives more information (although it might not); there's a chance that you can get the information you need via a google books search On 2023-10-12 10:25 a.m., Fernando Archuby wrote: Hi. I have successfully performed the discriminant analysis with the lda function, I can classify new individuals with the predict function, but I cannot figure out how the lda results translate into the classification decision. That is, I don't realize how the classification equation for new individuals is constructed from the lda output. I want to understand it but also, I need to communicate it and provide a mechanism for other colleagues to make classifications with their data. Thank you very much, Fernando __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis error: Variables appear constant
On 16.05.2011 22:07, Songer, Katherine B - DNR wrote: Hi R experts, I'm attempting to run Linear Discriminant Analysis using the lda function in the MASS package. I've got around 50 predictor variables and one response variable. My response variable has 5 numeric categories that represent different clusters of fish abundance data (clusters were developed using Bray-Curtis and NMDS), and my predictor variables are environmental variables that might influence the fish data. These data all came from 68 sampling locations. I'm getting an error message: DALogFish-lda(Cluster~DrainArea+Flow+StrmWidth+Gradient+NatComm+FishIBIUsed +QHEI+QHEIsub+QHEImwh_h+QHEIcov+QHEIchan+QHEIrip+QHEIpool+QHEIrif+QHEIgrads+ QHEIgradv+QHEImwh+QHEIcovtype+QHEIwwh+QHab+QHabBuff+QHabEros+QhabPool+ QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+NH3Min +NO3NO2N+BOD+TSS+TSSMax+TDS+SSC+SSCMax+Chloride+Sulfate+Ecoli+ChlA+DOper+ DOperMin+DOperMin1_5+DOmgL+DOmgLMean+DOmgLMax+Cond+pH+pHMax+Trans+Temp+ TempMin+Temp4+Crop100+Crop500+CropSub+Dev100+Dev500+DevSub+For100+For500+ ForSub+Pas100+Pas500+PasSub+Wat100+Wat500+WatSub+Wet100+Wet500+WetSub+ Undev100+Undev500+UndevTotal+Undev100NoPas+Undev500NoPas+UndevTotNoPas, data=AllData1, na.action=na.omit, CV=TRUE) Error in lda.default(x, grouping, ...) : variables 10 38 42 appear to be constant within groups When I look at the variables listed, they don't appear constant within the groups to me. We do not know, since we do not have the data. I'm new to LDA and am wondering what this error means... Are my data somehow not in the right format? Should I remove colinear variables? (All variables have been normalized.) Yes, colinear variables should be removed. Note als, that you have roughly as many (or even more) variables in the model than observations. This won't work either. I think you should read some textbook on the mechanisms behind an LDA. Uwe Ligges Thanks very much! Katie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis error: Variables appear constant
Uwe, Thank you very much for looking at this. I'm attaching the data, in case you have any wisdom on why variables 10, 38, and 42 would appear constant. Meanwhile, I'll remove colinear variables and read up a little more... Thanks, Katie -Original Message- From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] Sent: Tuesday, May 17, 2011 04:25 AM To: Songer, Katherine B - DNR Cc: r-help@r-project.org Subject: Re: [R] Linear Discriminant Analysis error: Variables appear constant On 16.05.2011 22:07, Songer, Katherine B - DNR wrote: Hi R experts, I'm attempting to run Linear Discriminant Analysis using the lda function in the MASS package. I've got around 50 predictor variables and one response variable. My response variable has 5 numeric categories that represent different clusters of fish abundance data (clusters were developed using Bray-Curtis and NMDS), and my predictor variables are environmental variables that might influence the fish data. These data all came from 68 sampling locations. I'm getting an error message: DALogFish-lda(Cluster~DrainArea+Flow+StrmWidth+Gradient+NatComm+Fish IBIUsed +QHEI+QHEIsub+QHEImwh_h+QHEIcov+QHEIchan+QHEIrip+QHEIpool+QHEIrif+QHEI +QHEI+QHEIsub+grads+ QHEIgradv+QHEImwh+QHEIcovtype+QHEIwwh+QHab+QHabBuff+QHabEros+QhabPool+ QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+N QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+H QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+3 QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+M QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+i QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+n +NO3NO2N+BOD+TSS+TSSMax+TDS+SSC+SSCMax+Chloride+Sulfate+Ecoli+ChlA+DOper+ DOperMin+DOperMin1_5+DOmgL+DOmgLMean+DOmgLMax+Cond+pH+pHMax+Trans+Temp DOperMin++ TempMin+Temp4+Crop100+Crop500+CropSub+Dev100+Dev500+DevSub+For100+For500+ ForSub+Pas100+Pas500+PasSub+Wat100+Wat500+WatSub+Wet100+Wet500+WetSub+ Undev100+Undev500+UndevTotal+Undev100NoPas+Undev500NoPas+UndevTotNoPas Undev100+Undev500+UndevTotal+Undev100NoPas+Undev500NoPas+, data=AllData1, na.action=na.omit, CV=TRUE) Error in lda.default(x, grouping, ...) : variables 10 38 42 appear to be constant within groups When I look at the variables listed, they don't appear constant within the groups to me. We do not know, since we do not have the data. I'm new to LDA and am wondering what this error means... Are my data somehow not in the right format? Should I remove colinear variables? (All variables have been normalized.) Yes, colinear variables should be removed. Note als, that you have roughly as many (or even more) variables in the model than observations. This won't work either. I think you should read some textbook on the mechanisms behind an LDA. Uwe Ligges Thanks very much! Katie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis error: Variables appear constant
- Reduce the model to a reasonable size with far less variables than observations. - Code factors as factors rather than numerics - don't use variables with perfect correlation to other nor any duplicates Best, Uwe Ligges On 17.05.2011 15:46, Songer, Katherine B - DNR wrote: Uwe, Thank you very much for looking at this. I'm attaching the data, in case you have any wisdom on why variables 10, 38, and 42 would appear constant. Meanwhile, I'll remove colinear variables and read up a little more... Thanks, Katie -Original Message- From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] Sent: Tuesday, May 17, 2011 04:25 AM To: Songer, Katherine B - DNR Cc: r-help@r-project.org Subject: Re: [R] Linear Discriminant Analysis error: Variables appear constant On 16.05.2011 22:07, Songer, Katherine B - DNR wrote: Hi R experts, I'm attempting to run Linear Discriminant Analysis using the lda function in the MASS package. I've got around 50 predictor variables and one response variable. My response variable has 5 numeric categories that represent different clusters of fish abundance data (clusters were developed using Bray-Curtis and NMDS), and my predictor variables are environmental variables that might influence the fish data. These data all came from 68 sampling locations. I'm getting an error message: DALogFish-lda(Cluster~DrainArea+Flow+StrmWidth+Gradient+NatComm+Fish IBIUsed +QHEI+QHEIsub+QHEImwh_h+QHEIcov+QHEIchan+QHEIrip+QHEIpool+QHEIrif+QHEI +QHEI+QHEIsub+grads+ QHEIgradv+QHEImwh+QHEIcovtype+QHEIwwh+QHab+QHabBuff+QHabEros+QhabPool+ QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+N QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+H QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+3 QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+M QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+i QHabWDRatio+QHabRif+QHabFines+QHabCov+QHabRating+QHabSize+TP+TKN+NH3+n +NO3NO2N+BOD+TSS+TSSMax+TDS+SSC+SSCMax+Chloride+Sulfate+Ecoli+ChlA+DOper+ DOperMin+DOperMin1_5+DOmgL+DOmgLMean+DOmgLMax+Cond+pH+pHMax+Trans+Temp DOperMin++ TempMin+Temp4+Crop100+Crop500+CropSub+Dev100+Dev500+DevSub+For100+For500+ ForSub+Pas100+Pas500+PasSub+Wat100+Wat500+WatSub+Wet100+Wet500+WetSub+ Undev100+Undev500+UndevTotal+Undev100NoPas+Undev500NoPas+UndevTotNoPas Undev100+Undev500+UndevTotal+Undev100NoPas+Undev500NoPas+, data=AllData1, na.action=na.omit, CV=TRUE) Error in lda.default(x, grouping, ...) : variables 10 38 42 appear to be constant within groups When I look at the variables listed, they don't appear constant within the groups to me. We do not know, since we do not have the data. I'm new to LDA and am wondering what this error means... Are my data somehow not in the right format? Should I remove colinear variables? (All variables have been normalized.) Yes, colinear variables should be removed. Note als, that you have roughly as many (or even more) variables in the model than observations. This won't work either. I think you should read some textbook on the mechanisms behind an LDA. Uwe Ligges Thanks very much! Katie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
Joris, Thank you, I have corrected my mistakes. I very much appreciate your time and patience. All my best, Cobbler. -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2240547.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
I checked your data. Now I have to get some sense out of your code. You do : G - vowel_features[15] cvc_lda - lda(G~ vowel_features[15], data=mask_features, na.action=na.omit, CV=TRUE) Firstly, as I suspected, you need to select a column by using vowel_features[,15] . Mind the comma! Essentially, your data frame is a list and a matrix. You select by using [x,y] with x being the row number and y being the column number. Essentially, your code says : cvc_lda=lda(vowel_features[,15]~vowel_features[,15]...) You're modelling a variable on itself which gives an error. What do you want to do in fact? If I take a look at your first code, it appears as if you want to do this : cvc_lda - lda(G~ ., data=mask_features,na.action=na.omit, CV=TRUE) The dot indicates you want to model G in function of all variables in the dataset mask_features. Ain't going to work, as the dimensions are completely wrong. dim(mask_features) [1] 671 52 dim(vowel_features) [1] 254 26 For lda, you need a dataset that has following structure : mydata groupV1 V2 V3 V4 ... 0 x1y1 z1 q1 ... 1 x2y2 z2 q2 ... ... So you can do lda(group~V1+V2+V3+V4+..., data=mydata,...) For example : # make some random data x - rep(c(0,1),50) y1 - rnorm(100,x) y2 - rnorm(100,1-x) # combine it in a dataframe mydata - data.frame(x,y1,y2) str(mydata) # look at the structure, you should have something similar head(mydata) # look the values, this shows you whether it all worked # example of lda function my.lda - lda(x~y1+y2,data=mydata,CV=T) summary(my.lda) Take a look at your data again, and first figure out which data you actually want to use. Basically, for every observation in G you need a set of linked observations in some variables. But as it is now, it's impossible to link one dataframe with the other. Cheers Joris On Sun, May 30, 2010 at 7:00 AM, cobbler_squad la.f...@gmail.com wrote: Hi Janis, As you have suggested below is the output for the following: test.vowel - vowel_features[,1:10] test.mask - mask_features[,1:10] dput(test.vowel) dput(test.mask) --- NOTE: outputs are limited test_vowel first 12 columns are all zero (total of 26 columns) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 10 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 50 0 0 0 0 0 0 0 0 0 60 0 0 0 0 0 0 0 0 0 70 0 0 0 0 0 0 0 0 0 80 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 test_mask (sample output for first 6 columns and 5 rows) V1 V2V3 V4 V5 V6 1 0.034495155 0.990218632 0.601464511 0.014837676 0.058299799 0.818202398 2 0.683688879 0.541566798 0.898061753 0.008456439 0.800863858 0.381366477 3 0.464978895 0.844494807 0.281241401 0.290183593 0.552412608 0.158107894 4 0.200058599 0.270115497 0.179173377 0.341301213 0.672338934 0.322934948 5 0.595020534 0.633111358 0.861024861 0.811241462 0.326562913 0.363330793 dput(test.vowel) structure(list(V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V6 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V8 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V9 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V10 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10), class = data.frame, row.names = c(NA, -254L)) dput(test.mask) structure(list(V1 = c(0.034495155, 0.683688879, 0.464978895, 0.877838275, 0.943014871, 0.163438168), V2 = c(0.990218632, 0.541566798, 0.025567579, 0.159811845, 0.13874224, 0.752357297, 0.669662897, 0.854803677, 0.28129096, 0.858919573, 0.98992922, 0.980733255, 0.452405459, 0.376828532, 0.901208552), V3 = c(0.601464511, 0.898061753, 0.38395498, 0.923324665, 0.529832526, 0.182135661), V4 = c(0.014837676, 0.166132726, 0.893089168, 0.45962114, 0.018438501, 0.667720635 ), V5 = c(0.058299799, 0.800863858, 0.552412608, 0.672338934, 0.185407787, 0.691367432), V6 = c(0.818202398, 0.381366477, 0.158107894, 0.322934948, 0.363330793, 0.161321704,
Re: [R] Linear Discriminant Analysis in R
Hi Janis, As you have suggested below is the output for the following: test.vowel - vowel_features[,1:10] test.mask - mask_features[,1:10] dput(test.vowel) dput(test.mask) --- NOTE: outputs are limited test_vowel first 12 columns are all zero (total of 26 columns) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 10 0 0 0 0 0 0 0 0 0 20 0 0 0 0 0 0 0 0 0 30 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 50 0 0 0 0 0 0 0 0 0 60 0 0 0 0 0 0 0 0 0 70 0 0 0 0 0 0 0 0 0 80 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 test_mask (sample output for first 6 columns and 5 rows) V1 V2V3 V4 V5 V6 1 0.034495155 0.990218632 0.601464511 0.014837676 0.058299799 0.818202398 2 0.683688879 0.541566798 0.898061753 0.008456439 0.800863858 0.381366477 3 0.464978895 0.844494807 0.281241401 0.290183593 0.552412608 0.158107894 4 0.200058599 0.270115497 0.179173377 0.341301213 0.672338934 0.322934948 5 0.595020534 0.633111358 0.861024861 0.811241462 0.326562913 0.363330793 dput(test.vowel) structure(list(V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V6 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V8 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V9 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V10 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10), class = data.frame, row.names = c(NA, -254L)) dput(test.mask) structure(list(V1 = c(0.034495155, 0.683688879, 0.464978895, 0.877838275, 0.943014871, 0.163438168), V2 = c(0.990218632, 0.541566798, 0.025567579, 0.159811845, 0.13874224, 0.752357297, 0.669662897, 0.854803677, 0.28129096, 0.858919573, 0.98992922, 0.980733255, 0.452405459, 0.376828532, 0.901208552), V3 = c(0.601464511, 0.898061753, 0.38395498, 0.923324665, 0.529832526, 0.182135661), V4 = c(0.014837676, 0.166132726, 0.893089168, 0.45962114, 0.018438501, 0.667720635 ), V5 = c(0.058299799, 0.800863858, 0.552412608, 0.672338934, 0.185407787, 0.691367432), V6 = c(0.818202398, 0.381366477, 0.158107894, 0.322934948, 0.363330793, 0.161321704, 0.052999774, 0.513440813, 0.402895033, 0.201576687, 0.076826481), V7 = c(0.642136394, 0.099776129, 0.148801865, 0.603051825, 0.440594157, 0.215038249, 0.531623479, 0.534920743, 0.45784502, 0.080887221), V8 = c(0.016004048, 0.519115043, 0.149317949, 0.088362708, 0.705002368, 0.185590863, 0.434963787, 0.847410734, 0.78777694, 0.443995646, 0.53903599), V9 = c(0.400620271, 0.918472003, 0.446820588, 0.310981412, 0.734013866, 0.172112916 ), V10 = c(0.532136091, 0.350028839, 0.40424688, 0.607395545, 0.392450857, 0.306530929, 0.756277707, 0.63606622, 0.718866192, 0.258778101)), .Names = c(V1, V2, V3, V4, V5, V6, V7, V8, V9, V10), class = data.frame, row.names = c(NA, -671L)) Thank you once more for your help. I really can not say it enough. ps. original files i work with are attached. Cobbler. http://r.789695.n4.nabble.com/file/n2236083/3dMaskDump.txt 3dMaskDump.txt http://r.789695.n4.nabble.com/file/n2236083/vowel_features.txt vowel_features.txt -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2236083.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
Thanks for being patient with me. I guess my problem is with understand how grouping in this particular case is used: one of the sample codes I found online (http://www.statmethods.net/advstats/discriminant.html) library(MASS) fit - lda(G ~ x1 + x2 + x3, data=mydata, na.action=na.omit, CV=TRUE) the mydata file in my case is the 3dmaskdump file with 52 columns and 671 rows (all values range between 0 and 1 after they're scaled) the other file, what I assumed was the grouping file (or the vowel_feature) is the file that defines features for the vowels (i.e. column 1 of the file is vowel name (a, i, u) and every other column in a distinct combination of 0's and 1's defining the vowel (so this file has 26 columns and 254 rows). Therefore, every column that follows represents a particular feature of that vowel.. (hope this makes sense!!) So, the reason I wanted to return G - vowel_feature[15] in my previous post is because I need to extract a column that represents backness of the vowel (while other columns represent roundedness, nasalization features, etc). So what (in my mind) G - vowel_feature[15] would return is 1 column which is 254 rows long with 0's and 1's in it. i.e. 1 0 2 1 3 1 4 0 ... .. . 2541 I am a novice with R (so I know my questions are pretty dumb!), but I really hope I clarified my confusion a bit better. I very much appreciate your help. Looking forward to your replies. Thank you again, Cobbler -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2235777.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
It's not your questions, Cobbler, but could you PLEASE just do what we asked for? Copy-paste the following in R and copy-paste ALL output you get in your next mail. test.vowel - vowel_features[,1:10] test.mask - mask_features[,1:10] dput(test.vowel) dput(test.mask) I don't know whether your vowel_features is a list or a data-frame (which is technically also a list). But I know for sure that vowel_features[15] is NOT giving you a column. Probably it has to be vowel_features[,15]. So start with that one, and I'll take a look at the rest to get your lda running. Cheers Joris On Sat, May 29, 2010 at 6:53 PM, cobbler_squad la.f...@gmail.com wrote: Thanks for being patient with me. I guess my problem is with understand how grouping in this particular case is used: one of the sample codes I found online (http://www.statmethods.net/advstats/discriminant.html) library(MASS) fit - lda(G ~ x1 + x2 + x3, data=mydata, na.action=na.omit, CV=TRUE) the mydata file in my case is the 3dmaskdump file with 52 columns and 671 rows (all values range between 0 and 1 after they're scaled) the other file, what I assumed was the grouping file (or the vowel_feature) is the file that defines features for the vowels (i.e. column 1 of the file is vowel name (a, i, u) and every other column in a distinct combination of 0's and 1's defining the vowel (so this file has 26 columns and 254 rows). Therefore, every column that follows represents a particular feature of that vowel.. (hope this makes sense!!) So, the reason I wanted to return G - vowel_feature[15] in my previous post is because I need to extract a column that represents backness of the vowel (while other columns represent roundedness, nasalization features, etc). So what (in my mind) G - vowel_feature[15] would return is 1 column which is 254 rows long with 0's and 1's in it. i.e. 1 0 2 1 3 1 4 0 ... .. . 2541 I am a novice with R (so I know my questions are pretty dumb!), but I really hope I clarified my confusion a bit better. I very much appreciate your help. Looking forward to your replies. Thank you again, Cobbler -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2235777.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
Could you provide us with data to test the code? use dput (and limit the size!) eg: dput(vowel_features) dput(mask_features) Without this information, it's impossible to say what's going wrong. It looks like you're doing something wrong in the selection. What should vowel_features[15] return? Did you check it's actually what you want? Did you use str(G) to check the type? Cheers Joris On Thu, May 27, 2010 at 5:28 PM, cobbler_squad la.f...@gmail.com wrote: Joris, You are a life saver. Based on two sample files above, I think lda should go something like this: vowel_features - read.table(file = mappings_for_vowels.txt) mask_features - data.frame(as.matrix(read.table(file = 3dmaskdump_ICA_37_Combined.txt))) G - vowel_features[15] cvc_lda - lda(G~ vowel_features[15], data=mask_features, na.action=na.omit, CV=TRUE) ERROR: Error in model.frame.default(formula = G ~ vowel_features[15], data = mask_features, : invalid type (list) for variable 'G' I am clearly doing something wrong declaring G (how should I declare grouping in R when I need to use one column from vowel_feature file)? Sorry for stupid questions and thank you for being so helpful! - again, sample files that I am working with: mappings_for_vowels.txt: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 1E 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 2o 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 3I 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 4^ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 5@ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 and the mask_features file is: V42 V43 V44 V45 V46 V47 V48 V49 [1,] 2.890891625 2.881188521 2.88778 -2.882606612 -2.77341 2.879834384 2.886483229 2.883815864 [2,] 2.763404707 2.756198683 2.761863881 -2.756827983 -2.762268531 2.754305072 2.760017050 2.758399799 [3,] 0.556614506 0.556377530 0.556247414 -0.556300910 -0.556098321 0.557495060 0.557383073 0.556867424 [4,] 0.367065248 0.366962036 0.366870087 -0.366794442 -0.366644148 0.366613343 0.366537320 0.366953464 [5,] 0.423692393 0.421835623 0.421741829 -0.421897460 -0.421659824 0.421567705 0.421465738 0.422407838 -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p223.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
cobler_squad needs more basic help than doing lda. The data input just doesn't make sense. If vowel_feature is a data frame, than G - vowel_feature[15] creates another data frame containing the 15th variable in vowel_feature, so G is the name of a data frame, not a variable in a data frame. The lda() call makes even less sense. I wonder if he had tried to go through the examples in the help file and try to understand how it is used? Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Joris Meys Sent: Friday, May 28, 2010 8:50 AM To: cobbler_squad Cc: r-help@r-project.org Subject: Re: [R] Linear Discriminant Analysis in R Could you provide us with data to test the code? use dput (and limit the size!) eg: dput(vowel_features) dput(mask_features) Without this information, it's impossible to say what's going wrong. It looks like you're doing something wrong in the selection. What should vowel_features[15] return? Did you check it's actually what you want? Did you use str(G) to check the type? Cheers Joris On Thu, May 27, 2010 at 5:28 PM, cobbler_squad la.f...@gmail.com wrote: Joris, You are a life saver. Based on two sample files above, I think lda should go something like this: vowel_features - read.table(file = mappings_for_vowels.txt) mask_features - data.frame(as.matrix(read.table(file = 3dmaskdump_ICA_37_Combined.txt))) G - vowel_features[15] cvc_lda - lda(G~ vowel_features[15], data=mask_features, na.action=na.omit, CV=TRUE) ERROR: Error in model.frame.default(formula = G ~ vowel_features[15], data = mask_features, : invalid type (list) for variable 'G' I am clearly doing something wrong declaring G (how should I declare grouping in R when I need to use one column from vowel_feature file)? Sorry for stupid questions and thank you for being so helpful! - again, sample files that I am working with: mappings_for_vowels.txt: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 1E 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 2o 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 3I 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 4^ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 5@ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 and the mask_features file is: V42 V43 V44 V45 V46 V47 V48 V49 [1,] 2.890891625 2.881188521 2.88778 -2.882606612 -2.77341 2.879834384 2.886483229 2.883815864 [2,] 2.763404707 2.756198683 2.761863881 -2.756827983 -2.762268531 2.754305072 2.760017050 2.758399799 [3,] 0.556614506 0.556377530 0.556247414 -0.556300910 -0.556098321 0.557495060 0.557383073 0.556867424 [4,] 0.367065248 0.366962036 0.366870087 -0.366794442 -0.366644148 0.366613343 0.366537320 0.366953464 [5,] 0.423692393 0.421835623 0.421741829 -0.421897460 -0.421659824 0.421567705 0.421465738 0.422407838 -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231 922p223.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
Joris, You are a life saver. Based on two sample files above, I think lda should go something like this: vowel_features - read.table(file = mappings_for_vowels.txt) mask_features - data.frame(as.matrix(read.table(file = 3dmaskdump_ICA_37_Combined.txt))) G - vowel_features[15] cvc_lda - lda(G~ vowel_features[15], data=mask_features, na.action=na.omit, CV=TRUE) ERROR: Error in model.frame.default(formula = G ~ vowel_features[15], data = mask_features, : invalid type (list) for variable 'G' I am clearly doing something wrong declaring G (how should I declare grouping in R when I need to use one column from vowel_feature file)? Sorry for stupid questions and thank you for being so helpful! - again, sample files that I am working with: mappings_for_vowels.txt: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 1E 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 2o 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 3I 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 4^ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 5@ 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 and the mask_features file is: V42 V43 V44 V45 V46 V47 V48 V49 [1,] 2.890891625 2.881188521 2.88778 -2.882606612 -2.77341 2.879834384 2.886483229 2.883815864 [2,] 2.763404707 2.756198683 2.761863881 -2.756827983 -2.762268531 2.754305072 2.760017050 2.758399799 [3,] 0.556614506 0.556377530 0.556247414 -0.556300910 -0.556098321 0.557495060 0.557383073 0.556867424 [4,] 0.367065248 0.366962036 0.366870087 -0.366794442 -0.366644148 0.366613343 0.366537320 0.366953464 [5,] 0.423692393 0.421835623 0.421741829 -0.421897460 -0.421659824 0.421567705 0.421465738 0.422407838 -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p223.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis in R
Why exactly do you need lda and not another method? For lda to be applicable, you should check : 1) whether the regressors are normally distributed within the classes 2) whether the variance-covariance matrices are equal for all classes Essentially, this means that the boundary between both classes is a hyperplane (or in 2 dimensions, a straight line). Otherwise you can try qda, or go to other supervised learning methods. How to use lda is explained rather well in the help files. if it doesn't work, provide us with self-contained code (i.e. code that can be run without need of extra information like data frames) that reproduces the error. Cheers Joris PS : There's an error in your code. scaled_features - scale(mask_features, center = FALSE, scale = apply(abs(mask_features, 2, median))) should be scaled_features - scale(mask_features, center = FALSE, scale = apply(abs(mask_features), 2, median)) On Wed, May 26, 2010 at 5:55 PM, cobbler_squad la.f...@gmail.com wrote: Dear R gurus, Thank you all for continuous support and guidance -- learning without you would not be efficient. I have a question regarding LD analysis and how to best code it up in R. I have a file of (V52 and 671 time points across all columns) and another file of phonetic features (each vowel is aligned with a distinct binary sequence, i.e. E 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 and so on). I need to run lda (at first for one of the features, meaning one column only extracted from the binary file mentioned above). In code so far I have very little, but here the short examples of both files: V57 file: V27 V28 V29 V30 V31 V32 V33 V34 1 -2.515000e-03 -0.203858 6.531000e-03 0.248686 6.76e-04 0.084677 -1.262000e-03 2 -2.406000e-03 -0.194943 6.248000e-03 0.237851 6.47e-04 0.081001 -1.207000e-03 3 -4.86e-04 -0.039288 1.263000e-03 0.047980 1.30e-04 0.016292 -2.43e-04 and binary file V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 1E 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 2o 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 3I 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 thus in code I have the following: library(MASS) vowel_features - read.table(file = mappings_for_vowels.txt) mask_features - read.table(file = 3dmaskdump_ICA_37_Combined.txt) #scale the mask_features file scaled_features - scale(mask_features, center = FALSE, scale = apply(abs(mask_features, 2, median))) #input vowel feature, lda lda(ROI_values ~ mappings_for_vowels[15]...) not sure what is the correct approach to use for lda any pointers would be greatly appreciated thanks again all! Cobbler -- View this message in context: http://r.789695.n4.nabble.com/Linear-Discriminant-Analysis-in-R-tp2231922p2231922.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis
Dear Arup, See the lda function in the MASS package. In general, require(MASS) Loading required package: MASS ?lda HTH, Jorge On Wed, Feb 25, 2009 at 4:44 AM, Arup arup.pramani...@gmail.com wrote: Kindly let me know the process to carry out a Linear discriminant analysis...thanks in advance Arup -- View this message in context: http://www.nabble.com/Linear-Discriminant-Analysis-tp22199424p22199424.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis
Maybe as a starter RSiteSearch(linear discriminant analysis) R has tools to help you help yourself with this types of questions. -Christos -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Arup Sent: Wednesday, February 25, 2009 4:45 AM To: r-help@r-project.org Subject: [R] Linear Discriminant Analysis Kindly let me know the process to carry out a Linear discriminant analysis...thanks in advance Arup -- View this message in context: http://www.nabble.com/Linear-Discriminant-Analysis-tp22199424p 22199424.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.