Hey Warren, thank you for your quick reply. Here are some thoughts: 1- I didn't know Spam Assassin was so fast. I never ran it on a server, but when I run it on my computer, it usually takes a while (about 1 second) to scan each mail. I always attributed that to all the blacklist queries. Redirects are indeed a problem. I'll have to think of something else then I guess. 2- I didn't think of that one. Good point. Even if I avoided 'confirm' links with the help of regular expressions, many times spammers send personalized URLs, so I would be 'confirming' just by downloading them.
Has any work been done in Spam Assassin to identify campaigns? That would surely lighten the costs of not only using webpages, but of Spam Assassin as a whole. Thank you, Marco Túlio Correia Ribeiro On Tue, Dec 28, 2010 at 5:00 AM, Warren Togami Jr. <[email protected]> wrote: > Hi Marco, > > I'm glad that you are thinking about anti-spam strategies. While this > approach might be helpful, I am afraid it has the following or more > problems: > > 1) It is indeed too expensive in terms of data and time. Normally > spamassassin takes a fraction of a second to scan each mail. Your mail will > become backed up if even a fraction of your incoming mail is waiting on > usually slow HTTP servers and their usual maze of redirects. > > 2) The act of blindly following URL's can have nasty side-effects like > confirming that your address is alive, thus attracting more spam. Sometimes > those links are to "confirm" subscription to a spammer's list. Thus they > send more spam, and claim that you opted in for that spam. > > Warren Togami > [email protected] > > On Mon, Dec 27, 2010 at 4:22 PM, Marco Ribeiro <[email protected]> wrote: >> >> I am aware of the Web Redirect plugin [4], but it was last updated in >> 2006. Is it too expensive to query for webpages? Does the cost make >> this approach useless? I was initially thinking of trying to implement >> this on Spam Assassin as a Google Summer of Code project, but it is >> such a basic task that (if it's usable) I could probably do it in no >> time. The classifier I used outputs readable rules, so it would be a >> piece of cake to translate them into regular expressions. And it seems >> spammers don't even bother trying to obfuscate the web pages (or maybe >> they don't even have control over them). For example, 36.7% of the >> webpages I downloaded contained the word viagra in them, and 99.84% of >> them were spam (the 0.16% probably was as well, it was probably due to >> some minor error). What do you guys think? Is it worth trying? Any >> ideas? >> >
