To the best of my knowledge, Apple Mail uses latent semantic analysis
for clustering. I wrote a Slashdot comment about this a while back:
http://slashdot.org/comments.pl?sid=108111&cid=9194254
Henry
Sidney Markowitz wrote:
I stumbled across this article
http://www.macdevcenter.com/pub/a/mac/2004/05/18/spam_pt2.html
while Googling around for anything that relates cluster analysis
techniques to spam filtering.
This may be old knowledge to some people here, but was new to me.
Apparently the trainable spam filter in Apple's Mail program does not
use the Bayesian approach that we are familiar with. It uses a cluster
discovery tool that was developed for document search and retrieval.
It would be interesting to compare this approach to Bayes. I'm also
curious if this provides some hints about using some techniques from
bioinformatics (as Justin referred to in a recent message to this
list) such as UPGMA cluster analysis( http://www.nmsr.org/upgma.htm ).
-- sidney