[sniffer]Re[2]: [sniffer]Re[2]: [sniffer]Re[2]: [sniffer]A design question - how many DNS based tests?

2006-06-07 Thread Pete McNeil
Hello Darin,

Wednesday, June 7, 2006, 5:09:27 PM, you wrote:

snip/

That would be a bad idea, sorry. After 30 days (heck, after 2) spam is
usually long-since filtered, or dead. As a result, looking at 30 day
old spam would have a cost, but little benefit.

 You misinterpreted what I was saying.  I was not at all suggesting sending
 old spam.  What I was talking about was copying spam@ with spam that does
 not fail sniffer _as it comes in_, or _during same day/next day reviews_

Sorry, I did misinterpret then. _as it comes in_ is good, provided the
weights are high enough to prevent a lot of FPs. We're all trained
pretty well on how to skip those - but the more we see, the more
likely we are to slip up ;-)

What we do use from time to time are virtual spamtraps. In a virtual
spamtrap scenario, you can submit spam that reached a very high (very
low false positive) score but did not fail SNF. Generally this is done
by copying the message to a pop3 account that can be polled by our
bots.

 That is exactly what I was suggesting.  We'll put it on our list to write a
 filter to do so when time permits.  Just trying to help.

Thanks very much!

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer]A design question - how many DNS based tests?

2006-06-06 Thread Pete McNeil
Hello Sniffer Folks,

I have a design question for you...

How many DNS based tests do you use in your filter system?

How many of them really matter?

Thanks!

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



Re: [sniffer]A design question - how many DNS based tests?

2006-06-06 Thread Peer-to-Peer (Support)
Hi _M,

Do you mean like reverse PTR records, or HELO lookups, etc..?

--Paul R.


-Original Message-
From: Message Sniffer Community [mailto:[EMAIL PROTECTED]
Behalf Of Pete McNeil
Sent: Tuesday, June 06, 2006 9:26 AM
To: Message Sniffer Community
Subject: [sniffer]A design question - how many DNS based tests?


Hello Sniffer Folks,

I have a design question for you...

How many DNS based tests do you use in your filter system?

How many of them really matter?

Thanks!

_M

-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]







#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



Re: [sniffer]A design question - how many DNS based tests?

2006-06-06 Thread Nick Hayer

Hi Pete,

Pete McNeil wrote:


How many DNS based tests do you use in your filter system?
 


approx 100


How many of them really matter?
 


depends  :)
I generally weight them all very low; its the combination of several 
that make each 'matter'.  As I review held mail I remove ones that are 
blatant fp's; double up on some by considering the last hop as a 
preference over any hop, etc.


-Nick


Thanks!

_M

 




#
This message is sent to you because you are subscribed to
 the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer]Re[2]: [sniffer]A design question - how many DNS based tests?

2006-06-06 Thread Pete McNeil
Hello Peer-to-Peer,

That's a good point.

Any kind, perhaps by category.

I was originally thinking of just RBLs of various types.

Thanks,

_M

Tuesday, June 6, 2006, 9:46:01 AM, you wrote:

 Hi _M,

 Do you mean like reverse PTR records, or HELO lookups, etc..?

 --Paul R.


 -Original Message-
 From: Message Sniffer Community [mailto:[EMAIL PROTECTED]
 Behalf Of Pete McNeil
 Sent: Tuesday, June 06, 2006 9:26 AM
 To: Message Sniffer Community
 Subject: [sniffer]A design question - how many DNS based tests?


 Hello Sniffer Folks,

 I have a design question for you...

 How many DNS based tests do you use in your filter system?

 How many of them really matter?

 Thanks!

 _M




-- 
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



Re: [sniffer]A design question - how many DNS based tests?

2006-06-06 Thread Scott Fisher

I use about 100 dnsbl/rbl/rhsbl list of varying weights and reliabilities.

How many matter...
I'd have to say the shining star is CBL. Hits 45% of the spam with a very 
low false positive rate.

The relay RBLs days are way behind them,
The proxy RBLs most useful days are behind them
The DUL RBLs I don't think have ever been comprehensive/correct enough to be 
as useful as they should be in the day of the spam zombie.
The spam source RBL's (other than CBL) are a little over-zealous to me 
causing me some false positives problems, thus lower than weight. They seem 
to be on the downtrend too. Oddly Fiveten Spam (127.0.0.2) has had a big 
jump in the last two months catching 60% of the spam although with a 1 % 
false positive rate.


I have 2 1/4 years of my spam test results posted at
All tests: http://it.farmprogress.com/declude/Testsbymonth.html
Spam tests: http://it.farmprogress.com/declude/spamtestbymonth.html
ham tests:  http://it.farmprogress.com/declude/hamtestsbymonth.html

- Original Message - 
From: Pete McNeil [EMAIL PROTECTED]

To: Message Sniffer Community sniffer@sortmonster.com
Sent: Tuesday, June 06, 2006 8:26 AM
Subject: [sniffer]A design question - how many DNS based tests?



Hello Sniffer Folks,

I have a design question for you...

How many DNS based tests do you use in your filter system?

How many of them really matter?

Thanks!

_M

--
Pete McNeil
Chief Scientist,
Arm Research Labs, LLC.


#
This message is sent to you because you are subscribed to
 the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]






#
This message is sent to you because you are subscribed to
 the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



Re: [sniffer]A design question - how many DNS based tests?

2006-06-06 Thread Colbeck, Andrew
I use just shy of 60 DNS based tests against the sender, both IP4R and
RHSBL.

Perhaps 10-12 matter.

Due to false positives, I rate most of them relatively low and have
built up their weights as a balancing act.  That act is greatly assisted
by using a weighting system and not reject on first hit, and furthered
by being able to do combo tests such as the example Nick offered on a
different thread this morning.

SPAMHAUS XBL (CBL and the Blitzed OPM), SPAMCOP, FIVETEN, MXRATE-BL are
consistent good performers for me.

Tests that I try out tend to stay in my configuration after they've
become inutile as long as they do no harm.  I groom the lists perhaps
four times per year.

Andrew 8)



 -Original Message-
 From: Message Sniffer Community 
 [mailto:[EMAIL PROTECTED] On Behalf Of Pete McNeil
 Sent: Tuesday, June 06, 2006 6:26 AM
 To: Message Sniffer Community
 Subject: [sniffer]A design question - how many DNS based tests?
 
 Hello Sniffer Folks,
 
 I have a design question for you...
 
 How many DNS based tests do you use in your filter system?
 
 How many of them really matter?
 
 Thanks!
 
 _M
 
 --
 Pete McNeil
 Chief Scientist,
 Arm Research Labs, LLC.
 
 
 #
 This message is sent to you because you are subscribed to
   the mailing list sniffer@sortmonster.com.
 To unsubscribe, E-mail to: [EMAIL PROTECTED]
 To switch to the DIGEST mode, E-mail to 
 [EMAIL PROTECTED]
 To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
 Send administrative queries to  [EMAIL PROTECTED]
 
 


#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer]AW: [sniffer]A design question - how many DNS based tests?

2006-06-06 Thread Markus Gufler
I use around 80 tests on one system in order to watch them and how theri
performance is going up and down. On other (high traffic) servers I use only
the best one.
I can confirm what others has mentoined as reliable blacklists (expect
fiveten for european systems: fiveteen has a FP-Rate of around 10% and it
seems that they are caused by IP-Adresses outside of America.

However I give each IP4R-Test only a relative small weight (between 1 and
10% of the hold weight. There is one combo-Test that has a list of the
reliablest IP-Blacklists. This combo-test is nearly as effective as Sniffer,
but it has definitively more FPs.
The combination of IP4R-tests is used further to combine them with other
reliable tests and I use them also to add different weights for positives
IP4R-Results depending of whats the originating country.

Some weeks ago one of my servers was not more able to reach the configured
DNS-Server (reconfigured firewall) and even if most spam was still catched
there was a noticeable reduction of spam-detection until I discovered the
problem.

Markus




 -Ursprüngliche Nachricht-
 Von: Message Sniffer Community 
 [mailto:[EMAIL PROTECTED] Im Auftrag von Colbeck, Andrew
 Gesendet: Dienstag, 6. Juni 2006 18:09
 An: Message Sniffer Community
 Betreff: Re: [sniffer]A design question - how many DNS based tests?
 
 I use just shy of 60 DNS based tests against the sender, both 
 IP4R and RHSBL.
 
 Perhaps 10-12 matter.
 
 Due to false positives, I rate most of them relatively low 
 and have built up their weights as a balancing act.  That act 
 is greatly assisted by using a weighting system and not 
 reject on first hit, and furthered by being able to do 
 combo tests such as the example Nick offered on a different 
 thread this morning.
 
 SPAMHAUS XBL (CBL and the Blitzed OPM), SPAMCOP, FIVETEN, 
 MXRATE-BL are consistent good performers for me.
 
 Tests that I try out tend to stay in my configuration after 
 they've become inutile as long as they do no harm.  I groom 
 the lists perhaps four times per year.
 
 Andrew 8)
 
 
 
  -Original Message-
  From: Message Sniffer Community
  [mailto:[EMAIL PROTECTED] On Behalf Of Pete McNeil
  Sent: Tuesday, June 06, 2006 6:26 AM
  To: Message Sniffer Community
  Subject: [sniffer]A design question - how many DNS based tests?
  
  Hello Sniffer Folks,
  
  I have a design question for you...
  
  How many DNS based tests do you use in your filter system?
  
  How many of them really matter?
  
  Thanks!
  
  _M
  
  --
  Pete McNeil
  Chief Scientist,
  Arm Research Labs, LLC.
  
  
  #
  This message is sent to you because you are subscribed to
the mailing list sniffer@sortmonster.com.
  To unsubscribe, E-mail to: [EMAIL PROTECTED] To 
 switch to 
  the DIGEST mode, E-mail to [EMAIL PROTECTED] 
 To switch 
  to the INDEX mode, E-mail to [EMAIL PROTECTED] Send 
  administrative queries to  [EMAIL PROTECTED]
  
  
 
 
 #
 This message is sent to you because you are subscribed to
   the mailing list sniffer@sortmonster.com.
 To unsubscribe, E-mail to: [EMAIL PROTECTED] To 
 switch to the DIGEST mode, E-mail to 
 [EMAIL PROTECTED] To switch to the INDEX mode, 
 E-mail to [EMAIL PROTECTED] Send administrative 
 queries to  [EMAIL PROTECTED]
 
 



#
This message is sent to you because you are subscribed to
  the mailing list sniffer@sortmonster.com.
To unsubscribe, E-mail to: [EMAIL PROTECTED]
To switch to the DIGEST mode, E-mail to [EMAIL PROTECTED]
To switch to the INDEX mode, E-mail to [EMAIL PROTECTED]
Send administrative queries to  [EMAIL PROTECTED]



[sniffer]Re[2]: [sniffer]A design question - how many DNS based tests?

2006-06-06 Thread Pete McNeil
Hello Matt,

Tuesday, June 6, 2006, 12:37:56 PM, you wrote:

snip/

 appropriately and tend to hit less often, but the FP issues with
 Sniffer have grown due to cross checking automated rules with other
 lists that I use, causing two hits on a single piece of data.  For
 instance, if SURBL has an FP on a domain, it is possible that
 Sniffer will pick that up too based on an automated cross reference,
 and it doesn't take but one  additional minor test to push something
 into Hold on my system.

Please note. It has been quite some time now that the cross-reference
style rule-bots have been removed from our system. In fact, at the
present time we have no automated systems that add new domain rules.

Another observation I might point out is that many RBLs will register
a hit on the same IP - weighting systems using RBLs actually depend on
this. An IP rule hit in SNF should be treated similarly to other RBL
type tests. This is one of the reasons that we code IP rules to group
63 - so that they are tumped by a rule hit in any other group and
therefore are easily isolated from the other rules.

snip/

 handling false positive reports with Sniffer is cumbersome for both
 me and Sniffer.

The current process has a number of important goals:

* Capture as much information as possible about any false positive so
that we can improve our rule coding processes.

* Preserve the relationship with the customer and ensure that each
case reaches a well-informed conclusion with the customer's full
knowledge.

* Protect the integrity of the rulebase.

This link provides a good description of our false positive handling
process:

http://kb.armresearch.com/index.php?title=Message_Sniffer.FAQ.FalsePositives

Can you recommend an alternate process, or changes to the existing
process that would be an improvement and would continue to achieve
these goals? We are always looking for ways to improve.

 I would hope that any changes
 seek to increase accuracy above all else.  Sniffer does a very good
 job of  keeping up with spam, and it's main issues with leakage are
 caused by  not being real-time, but that's ok with me.  At the same
 time Sniffer is the test most often a part of false positives, being
 a contributing  factor in about half of them.

Log data shows that SNF tags on average more than 74% of all email
traffic and a significantly higher percentage of spam typically.

It would seem that it is likely that SNF would also represent highly
in the percentage of false positives (relative to other tests with
lower capture rates) for any given system since it is represented
highly in email traffic as a whole.

You've also indicated that you weight SNF differently than your other
tests - presumably giving it more weight (this is frequently the case
on many systems).

How much do you feel these factors contribute to your findings?

   About 3/4 of all FP's (things that are  blocked by my system) are
 some form of automated or bulk E-mail.  That's not to say that other
 tests are more accurate; they are just scored more appropriately and
 tend to hit less often, but the FP issues with Sniffer have grown
 due to cross checking automated rules with other lists that I use,
 causing two hits on a single piece of data,

W/regard causing two hits on a single piece of data: SNF employs a
wide variety of techniques to classify messages so it is likely that a
match in SNF will coincide with a match in some other tests. In fact,
as I pointed out earlier, filtering systems that apply weights to
tests depend on this very fact to some extent.

What makes weighting systems powerful is that when more than one test
does trigger on a piece of data, such as an IP or URI fragment, that
the events leading up to that match were distinct for each of the
matching test. This is the critical component to reducing errors
through a voting process.

Test A uses process A to reach conclusion Z.

Test B uses process B to reach conclusion Z.

Process A is different from process B and so the inherent errors in
process A are different than the errors in process B and so we presume
it is unlikely that an error in Test A will occur under the same
conditions as the errors in Test B.

If a valid test result is the signal we want, and an erroneous test
result is noise on top of that signal then it follows:

By combining the results of Test A and Test B we have the opportunity
to increase the signal to noise ratio to the extent our assumptions
about errors are true. In fact, if no error occurs in both A and B
under the same circumstances, then defining a new test C as (A+B/2)
will produce a signal that is twice as clear as test A or B on it's
own.

If I follow what you have said about false positives and SNF matching
other tests, then you are describing a situation where the process for
SNF and the alternate tests are the same - or put another way, that
SNF somehow represents a copy of the other test and so will also
contain the same errors. If that's the case then the