Michael, could you paste that to the bug on the bugzilla? your comments will not make it in there otherwise.
--j. Michael Peddemors writes: > On Thursday 01 May 2008 04:03, [EMAIL PROTECTED] > wrote: > > Steven Champeon has been in touch regarding 'testing my enemieslist rDNS > > patterns data against the SpamAssassin spam/ham corpus(es) to see if > > there's a reason for us to collaborate.' > > > I'm curious to see how incorporating EL DNSBL lookups into SpamAssassin > > might be useful; we have a DNSBL mirror network (currently three hosts, > > with more on the way) or I can talk about how to use it with a patched > > rbldnsd if you wanted to do some local testing. It'd be really > > Actually, this is surprising that SA hasn't looked at something like this > already.. We also use a similar method in our Mail Server technologies, > albeit we do it in the SMTP layer.. but I think this begs a few questions.. > > o Should it be RBL based.. > > In the past SA users have been stung with RBL based lookups, when RBL's get > blocked etc.. leading to very high system loads.. > > o Should SA start integrating a definition update program for something like > this? > > Compiling even 10k regex patterns takes very little overhead, and by doing > daily updates of a locally cached list there is little risk of problems even > when the updater fails, the latest regex's will always be on hand. > > o Should this use one regex supplier, or community based? > > This might be more helpful, as since there are projects like Enenies List, > our > own DynaRegex .. or other companies, projects etc.. that might evolve out of > this. > > It also could have several different types of regex patterns, as mentioned > below so that SA users could choose score settings for some patterns > differently than others.. Some patterns are safe enough to score very high, > while generic shared webhost patterns may want to be scored a little lower. > > I think that the regex pattern database would be an excellent candidate for > building out an SA defintion updater.. > > > OK, sounds good. I'm really interested in seeing what the various FP > > rates would be for both the HELO and PTR for the various return values; > > I'm also interested in seeing what rates are for the different > > subclasses (as formed by the combination of A response and TXT response > > for the same lookup, so "static/cable" or "dynamic/dsl" or > > "natproxy/vpn"). Basically, I'm using these today as very blunt hammers, > > and I want to make sure I have a good sense of how to better tune the > > scoring. And you guys have such great stats, so I came to you :) > > > > > So, these are generally run against the SMTP connecting host's > > > rDNS, right? > > > > Both PTR and HELO/EHLO string, yes. We've found that PTR is a good > > indicator, but when the HELO string is a match for some EL pattern it's > > a very reliable indicator of bot activity with a very low FP rate, so we > > test both when available. Of course, this differs between the various > > types, so I wouldn't assume webhost or outmx or static PTR are > > necessarily bad, just indicative. But we'll see what the numbers > > look like after we run some tests, I suppose :) > > > > > By the way, do you mind if we conduct this conversation on a public > > > Bugzilla entry? that's generally how we do it. Doing that in the > > > open is also more likely to get useful info on how other hosts > > > have found the increased load from SpamAssassin lookups, too. > > > > No, not at all, though I definitely want to know how adding this to > > SA would affect our load; and give me time to throw a few more rbldnsd > > mirrors into the rotation if required. (Running lookups against the > > patterns is very fast, 75K/s here on my macbook, but once you add > > logging and DNS overhead it slows down considerably :-/) > > > > So, what next? Should we look at setting up a local rbldnsd instance > > to isolate testing from our production machines? Was the doc I sent > > a URL for in my last email sufficient to tweak whatever SA rules > > you need to test? I'm here to answer any questions you have :) > > > > > > > > Anyway, usage details are here: http://enemieslist.com/how/use.html -- > > we'd need to add some rules to do this. I've been meaning to do this for > > several weeks(!) but things have been busy :( so here's a new ticket. > > -- > -- > "Catch the Magic of Linux..." > ------------------------------------------------------------------------ > Michael Peddemors - President/CEO - LinuxMagic > Products, Services, Support and Development > Visit us at http://www.linuxmagic.com > ------------------------------------------------------------------------ > A Wizard IT Company - For More Info http://www.wizard.ca > "LinuxMagic" is a Registered TradeMark of Wizard Tower TechnoServices Ltd. > ------------------------------------------------------------------------ > 604-589-0037 Beautiful British Columbia, Canada > > This email and any electronic data contained are confidential and intended > solely for the use of the individual or entity to which they are addressed. > Please note that any views or opinions presented in this email are solely > those of the author and are not intended to represent those of the company.
