Henry Stern wrote:
Apple Mail uses latent semantic analysis for clustering

That sounds right. Some people there were looking at that for document retrieval when I worked at Apple Research in the mid-90's.


By the way, have you seen the work applying cased-based reasoning to spam filtering? There are two articles on that at

http://www.cs.tcd.ie/publications/tech-reports/tr-index.04.html

with a bit more at the home page of one of the authors:

http://www.comp.dit.ie/sjdelany/

I've been thinking about whether there might be benefit in making a finer distinctions than just spam or not-spam, by clustering into perhaps spam topics. Why should the characteristics for porn spam, multilevel marketing spam, Nigerian 419, etc., be combined? Would there be benefit from making their differences explicit?

 -- sidney



Reply via email to