> > If I have the time, I'll give my suggestions regarding the use > of SPF and RDNS a shot, and report back on the results. My hunch > is that they'll offer decent improvements, especially in handling > first time senders. Better, perhaps I'll process the message logs > and give some feedback on how this approach might fare. >
I ran some tests using our mail logs as input data. After reviewing the results and the process that I used, I think I can get better and more useful results by changing the analysis program, but that will be more work and will have to wait for another day. My program tracked 'grey_new' entries, where we record the first time we see a (sender_from_address, recipient_address, helo_ip/24) triple. I also extracted the helo_address (eg. [EMAIL PROTECTED]) from a related log entry. Given this triple, I can look it up in the SQL data base to see whether we ultimately accepted mail from that tempfail, or whether it never returned after being tempfailed. At the moment, we don't recycle the triples in our database, and it is rather large (about 350,000 entries). I've wanted to run some programs on the data to come up with a blacklist of particular offensive ISP's, so have not recycled old entries. Given the sender/recipient/helo_address triples above, I noted whether the helo_address appeared to be forged (based upon sendmail's determination), and using Mail::SPF::Query I noted whether this sender_address/helo_address pair registered as as a 'pass', or anything else. With this data in hand, I looked at messages that either (1) received a 'pass' from SPF, or (2) were not forged and the sender's from address matched the domain part of helo address. Of a total of 38215 greylist new entries (first time, tempfail), 1631 met the SPF/sender address criteria. Of those 1631 entries 682 entries were ultimately accepted for delivery, thus there was no harm in white listing them early using the heuristic. 931 would have been 'false positives' in that we would have accepted them early using the heuristic, when in fact they never retried after the tempfail in the old scheme. The heuristic would've accepted 4.2% newly seen sender/recipient triples, with a roughly 60/40 split of 'false positives' to messages that would ultimately have been white listed anyway. Note that of the 'false positives' not all of them were necessarily spammers. Some of them might have been legitimate senders using poorly configured software. In any event, this technique at worst adds only 2.5% more entries which are delivered and which must subsequently be processed using the access list and content filters. I hand inspected a few entries accepted by the technique above, for early bypass of the greylist mechanism. The heuristic did a good job of letting through legitimate first timers, which of course is the point of going to all the trouble to make these extra checks. Overall, I'd say this heuristic using SPF and simple analysis of the sender address and helo address has promise in improving the system's ability to let legitimate first time senders through immediately. Perhaps by also validating the helo address as a valid mx for the sender address, or noting that it is in the same /24 as the sender, the heuristic can be improved further. - Gary _______________________________________________ NOTE: If there is a disclaimer or other legal boilerplate in the above message, it is NULL AND VOID. You may ignore it. Visit http://www.mimedefang.org and http://www.roaringpenguin.com MIMEDefang mailing list [email protected] http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

