Hi, I have a dataset consisting of 2 groups of samples on a set of variables, I would like to fish out a subset of variables or combination of subset of variables to discriminate the 2 groups, so I chose to use LDA from MASS library in R to do the analysis.
I foud out that wehther I normalize the data or not (mean 0, variance 1), I get the final prediction exactly the same, justified by plotting the discriminating scores of the 2 groups of samples and obtaining the exactly same shape of plot (certainly the scale of the scores are different). And of course, the linear discriminant coefficients are different under the conditions of normalized or not, thus picking out different set variables based on the scale of the coefficients. My question are: is it correct that you would get the same final prediction whether you normalize the data or not? Then is it correct that the purpose of normalization is solely to allow you to find out the discriminant power of the original power because all original variables are on the same scale now so that no variable will dominate others simply because its variance is too big. But normalization will not change the prediction of the samples. I also find out that if I did t tests for each of the original variables, the t-statistics is not linearly related to the discriminant power of the original variables whether normalize or not. That is, a variable that has a large t statsitic may have very low discrimant power justified by its discrimiant coefficient. how to explain this? The last question is my understanding of linear discriminant analysis with multiple variables is that if 2 variables have high correlation ( like close to 1), then LDA will not give both variables high discriminant coefficients, but will give only one of them high coefficient since the 2 variables are redundant. However, I found out that my results with the above data showed many variables with large discriminant coefficients are highly correlated. is this normal? Thank you very much . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
