Thanks, David - again, good info. Ken
-----Original Message----- From: David F. Skoll [mailto:[EMAIL PROTECTED] Sent: Monday, June 20, 2005 4:22 PM To: [email protected] Subject: Re: [Mimedefang] Using a db for subject lines to block Cormack, Ken wrote: > Can anyone see any problems with the code below? Just logging, it appears > to be working pretty well. You may want to make your subject canonicalization a little smarter, like: $lc_subject = s/^\s+//; # Trim leading whitespace $lc_subject = s/\s+$//; # Trim trailing whitespace $lc_subject = s/\s+/./g; # Collapse whitespace into periods The third regexp will collapse multiple runs of spaces, so: really cheap mortgages gets collapsed into really.cheap.mortgates You might (or might not?) want to delete other non-letter characters. > # scan database for each word in the subject I wonder if you want to remember repeated words? Otherwise something like "a a a a a a a a a a a a a a a" can make you do an awful lot of DB lookups. Probably not a big deal in practice. Regards, David. _______________________________________________ Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list [email protected] http://lists.roaringpenguin.com/mailman/listinfo/mimedefang _______________________________________________ Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list [email protected] http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

