Dear All,

I have used the lda() function in the MASS library to estimate a set of 
discriminant functions to assign samples from a training set to one of six 
groups.  The cross validation generates nearly perfect predictions for samples 
in the training set.  Hooray!

Now I want to use lda.predict() to estimate both discriminant function scores 
and probabilities of group membership for a second set of samples whose group 
membership is unknown.  For each unknown sample, lda.predict() produces a six 
probabilities. These probabilities sum to one. So lda.predict() seems to assume 
that the unknown samples do, in fact, belong to one of the six groups.  

The problem is that it is nearly certain that some of the unknown samples in 
the second set do not belong to any of the six groups. For those samples, 
probabilities of group membership should be close to zero for all six groups.  
In fact, identifying which samples are unlikely to belong to any of the six 
groups is a major goal of the analysis. 

So the question is, what is lda.predict() doing behind the scenes to force the 
group membership probabilities to sum to one? How do I get it to not do this 
and produce probabilities that accurately reflect the large Mahalanobis 
distances of some of the unknown sample from any group centroid?\

I have searched the R-list archive on this and have found several folks asking 
similar questions, but no helpful answers.

Thanks very much!

Fraser
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to