On Thursday, January 13, 2005, 7:00:23 PM, Daniel Quinlan wrote: > So, as you may be aware, we have a minor issue in terms of figuring out > which whitelisted domains should be skipped in queries.
> SpamAssassin now ships with list of domains that are excluded for > SURBL lookups from the SURBL whitelist. This list is the 125 > most commonly queried domains. > SURBL counts the number of queries each domain receives to track the > most commonly queried domains so we can produce an accurate list of > domains. > But, once we skip a domain, its relative volume is going to drop way > off in the SURBL data. > One idea I had to fix this is that SA not use the SURBL whitelist for 1 > in 10 queries and that those be directed to a different zone. However, > that would be somewhat counterproductive in terms of DNS caching and I'm > not sure how happy Jeff would be about the idea. > Another way would be to not use the exclusion list for certain periods > of time if you could select just those times for generating volume > data. A bit too hacky. > Another way to fix the problem would be to rank the domains with some > other source of volume data (not SURBL-related) such as looking at a DNS > cache at a large ISP. > Any other ideas? > Daniel As a matter of fact, Sonic (a medium-large ISP) has offered me a ham and spam URI host feed, but I have not had a chance to look at it yet. The ham data could be a source of good white list domains. Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
