The one issue with this I have is
 
1) Forward full original source to Sniffer with license code.
If we could do it without the license code, it would be much easier to automate on our end.  I already have a process in place to copy and reroute false positives by rewriting the Q file.  I'm hesitant to alter the message itself to add the license code.  If we could authenticate the FP report via some other means it would help greatly.  How about connecting IP instead?

Darin.
 
 
----- Original Message -----
From: Matt
Sent: Wednesday, June 07, 2006 12:59 AM
Subject: Re: [sniffer]FP suggestions

Pete,

Regarding suggestions for easing the reporting process, I would recommend the following possible modifications:
1) An E-mail submission tool similar to the one now, but replies would be automated
2) Send back links or rather an HTML form with checkboxes in an E-mail auto-response allowing one to block rules.
3) Make blocked rules automatic for the submitter, but throw them into a queue for manual review by Sniffer folk in order to determine whether the blocks should become applied to all rulebases.
4) Have automatic triggers that lower rule strengths based on users blocking rules regardless of direct Sniffer action.
The gist of this is to make it more point and click.  The fact that you need full source is cumbersome, so the above recommendations seek ways to make the process easier for both the customer and for Sniffer while dealing with the need to send the full source.  No direct customer interaction would be necessary in most cases, and you would have a queue full of items to review and make a determination about that customers have preened for you.  To the customer, the process would look like the following:
1) Forward full original source to Sniffer with license code.
2) Seconds later there would be an automated reply received in HTML format with a check box for every rule failed (or note that no active rules were found), a text box for optional comments, and submit button.
3) Customer checks the boxes for the rules he wants to block, adds notes in a text field if they feel like it, and they press submit.  End of story.
You could also add a Web interface for this if you wanted to, but E-mail seems the most appropriate for most.

I don't think it would be beneficial to rehash a lot of things involving how FP's occur, at least on this list.  I know from my system where my customers have single-click reprocessing capability, that they miss about 97% of all FP's either because they don't bother to do review, or they don't bother to reprocess anything but personal E-mail that may get blocked.  I would imagine that Sniffer sees a similar rate of customer reported FP's due in part to the difficulty, and in part for the same reasons that relate to my own users.

The three biggest sources of false positives are obscure foreign domains/IP's, rules generated from bulk mailings that are too broadly targeted, and things reported to Sniffer that are advertising, but not spam.  All three of these things are difficult and time consuming to deal with, particularly the last two.  Here's some stats for Sniffer FP's on my system going back about 15 months:

SNIFFER-GENERAL         283
SNIFFER-EXPERIMENTAL    167    * Excluded 79 FP's from bad rule event on 1/17 - 1/18/2006
SNIFFER-IP               61
SNIFFER-PHISHING         52
SNIFFER-GETRICH          29    * Excluded 115 FP's from bad rule event on 4/18 - 4/19/2006
SNIFFER-PHARMACY         25
SNIFFER-PORN             24
SNIFFER-TRAVEL           13
SNIFFER-INSURANCE         7
SNIFFER-OBFUSCATION       6
SNIFFER-DEBT              6
SNIFFER-MALWARE           4
SNIFFER-AVSOFT            3
SNIFFER-CASINO            2
SNIFFER-INK               1
SNIFFER-MEDIA             1
SNIFFER-SPAMWARE          0
It is quite notable how high the FP's are with SNIFFER-GENERAL which is where most bulk-mailers and customer reported spam rules are tagged.  This is also what my numbers show even though my customers are much less likely to reprocess bulk mail, and of course they only reprocess a small fraction of my overall FP's.  This is almost all customer reported stuff.  I score SNIFFER-GENERAL at 53% of my Hold weight.  SNIFFER-IP is another standout.  I only score SNIFFER-IP at 38% of my Hold weight and it hits less than 2% of all Sniffer hits, yet it scored comparably high so that is worth noting. The FP rate on SNIFFER-IP hasn't really changed since you made adjustments.  SNIFFER-EXPERIMENTAL is a top category that caught a lot of zombie spam which is important to many systems, but it did seem to have a high FP rate.  SNIFFER-PHISHING was worse for me until around January or February.  It seemed to have a lot of FP's on security related newsletters and chain letters.  I have mixed feelings about those things.  Maybe more efforts on white rules would help with that stuff, and I'm not totally sure if it is appropriate to block chain letters even though I detest this stuff myself.

Most FP's do in fact pass through my system as was evidenced by the two bad rule events earlier this year.  I held around 1/5 of all hits on the bad rules, and I would expect that normal FP's in Sniffer are passed no less than 50% of the time if not 75% of the time, and likely there were 30 times the number of Held FP's than my list shows over the last 15 months.

I have mitigated the affects of the over broad rules on bulk mail by assembling a list of IP's and reverse DNS entries for bulk mailers so that I can treat them differently on my system.  Some I do block by default, but others that send a majority of legitimate messages I balance in my system so that they don't fail technical tests (such as Declude's BADHEADERS, SPAMHEADERS and HELOBOGUS) since they are not appropriate for non-zombies, and at least two hits on things like Sniffer, SURBL and SpamCop are required before something gets blocked.  The essence of the issue here is that while one mailing might be spam, it doesn't make sense to paint all of the provider's mailings as spam.  Sniffer still hits a lot of what passes through my system, but they are not blocked nearly as often as before.  I recall reporting places like roving.com, icebase.net, cheetahmail.com and other well known providers to Sniffer as false positives in the past.  I recognize that Sniffer has a lot of clients that don't care if such things get blocked, and do care a lot if spam leaks through, and it is tough to target individual lists as opposed to the entire provider.  Another subset of this is third-party tracking services that sometimes get tagged based on the fact that some of their clients do spam, and then there is the subset that advertises third-parties with direct links to their domains where those third-parties have spammed, i.e. Entertainment Books, Omaha Steaks, University of Phoenix Online, etc.  You clearly pulled those domains from spam samples, but they cross-contaminate opt-in lists through advertising links.  Based on past discussions, we may well differ on our opinions about how to deal with this, but it isn't workable to continually report such things if they continually get listed in other ways, and that's why I decided sometime ago to track these providers myself.

Regarding those things that are submitted to Sniffer as spam that I don't consider to be spam, that's another very tough cookie to crack.  If one customer reports E-mails from Harry & David to Sniffer as spam, and I report it as ham...who is right?  I think that this comes down to having an official definition of spam and communicating it.  Spamhaus has a definition that isn't workable in the real-world because it requires affirmative confirmation of being put on a bulk mail list by companies that you do business with, and as we know, virtually no one follows this so closely, yet many want these E-mails and everyone has the ability to unsubscribe.  My definition of spam, like Spamhaus, is that it is both bulk and unsolicited, however we differ when it comes to defining unsolicited.  I try to allow any first-party communications, advertising or otherwise, so long as they are directly related to the actions that created the relationship (i.e. no third-party offers), and so long as they honor unsubscriptions without jumping through hoops (like remembering obscure logins to stop messages).

There are of course some grey areas around the edges such as lists that mix opt-ins with harvested/bought lists, but they are fairly rare and I can't suggest a pure way to deal with such things outside of hard work and manual review, though I would discourage against collection methods that can cause pollution such as automated submissions which for instance can report things like Harry & David because the admin blacklisted them locally, and then they show up as blocked spam that Sniffer didn't hit and are submitted.  I think this may be a contributing factor.  I would prefer that people manually report, and that they know the rules for what to report (i.e. the spam definition).

I didn't intend to draw you into this discussion within another thread or at this time, but I do think that Sniffer would benefit from some more focus on the FP issues.
I hope this helps and I am willing to lend some more ideas or opinions if you want to bounce some of your own off of me or the list.

Thanks,

Matt

Reply via email to