I have been reading Paul Graham's essays on spam filters and amazed at the effectiveness of his statistical filters.

I haven't encoutered a big spam problem (I guess I am not popular enough yet) However I do have huge amount of mails that come into my mailboxes: tons from mailinglists, and quite a few from my banks, my universities, my friends, and a bunch of opt-in promotions, alerts etc. Most of them don't qualify as spam, however, large percent of my mails I don't want to read promptly, and some portion of my mail I only read from time to time and skip most of the time.

My current strategy is to use procmail to sort my mails into different mailboxes (over a dozen atm and growing larger). However, it still annoys me because, for example, the most offen read inbox -- lfschat still contains only very small portion of mail that I am really interested in reading.

So during reading Paul's essay, I got this idea, apply the statistical filter to all my mails to not only just two categories, but several categories: such as Spam, Interesting, Advertisement, AccountUpdate, StrangeLogEventsAndAlerts, PrivateMustRead, MildInterest, LeastInterest... etc.

Apparently the simple minded token treatment in Paul's essay may not be quite effective against non-spam categories, but without actually tring it out, who knows, it may amaze me.

Any comments?

Cheers,

--
Hui Zhou
--
http://linuxfromscratch.org/mailman/listinfo/lfs-chat
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page

Reply via email to