I was reading a document describing the so called "Locality Sensitive
Hashing" 

http://www.stanford.edu/class/cs345a/slides/05-LSH.pdf

http://en.wikipedia.org/wiki/Locality-sensitive_hashing

and, while going through it, I started thinking at the ASSP spam and
notspam corpus; as it was discussed in a past, it's possible that a
"flood" of similar spam or ham messages may somewhat unbalance the
corpus, now, the above may represent a solution, in such a case, the
rebuild may just "skip" messages if they're repeated too often, so
allowing to keep the corpus balanced

Not just that, while writing this I was also thinking at another
possible usage for the above... let's say we receive a given email
which, once processed by LSH (see above) has a given "bucket", let's
also say that the email was recognized as "spam" (or ham, whatever);
now, a second incoming mail hitting the same "LSH bucket" as the first
one has quite high probabilities to bee "spam" (or ham) too, so the
approach may also be used to help classifying messages !




------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Assp-test mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to