RE: [Declude.JunkMail] Effectiveness

Madscientist Tue, 08 Oct 2002 04:12:13 -0700

]_M,
]
]Ah, 70% of all mail is spam.  Last time I checked, I was running
]over 60%, both are high numbers compared to others I've seen, in
]the 35-45 range.  The major difference between the domains I've
]seem that would effect this amount is the number of years the
]domains have been live/active.  I understand your concerns ("For
]obvious reasons I cannot disclose how we develop our spam traps"),
]how long (months/years) does it generally take an exposed spamtrap
]to mature into a useful one?


It can take up to a year to get one rolling, and up to 3 for it to
completely mature. Much of this time is dependent upon how hard we are able
to "work" the trap. It's a surprizing amount of effort - and luck.

]>Typically the 8% not captured is made up of multiple copies of new spam in
]>it's early phases of deployment. We have been increasing our
]update rates to
]>compensate as our user base grows to support the extra effort.
]
]Of this 8%, what proportion are pro's needing domain/IP blocks,
]and what proportion are amateurs needing things more like content
]filters?  When I first started, the amateurs (yahoo.com,
]hotmail.com, and the like) were much harder to block but now that
]I've countered most (not all) of their tactics, its the pro's that
]push through with their new domains and IPs.

It's really hard to answer this question lately - the data is increasingly
unclear.

It seems they're getting quite a bit smarter (as expected). The pro's push a
lot of new stuff, but it seems from our experience that even they are using
sites like yahoo, hotmail, and geocities to support their efforts - often
through some automated processes. Some delivery methods that we have seen
are quite sophisticated - often drawing on multiple domains and random
one-off web sites. We've predicted a number of advanced methodologies for
beating filters of all types and we've started to see an accelleration
toward these advanced methods... everything from delivery scattering to new
obfuscation techniques and pseudo-encryption. (We regularly war-game to plan
ahead.)

Since Message Sniffer looks at the whole message we are generally able to
apply filers to IP blocks, content, behaviors, and combinations of these...
Most of the time we find the effective rules are for content - especially
behaviors and "delivery constants" like return addresses and web links. We
find that pro spammers have access to so many networks and alternative
delivery methods that if we concentrate on the message we get a much more
accurate capture result. It's not uncommon to see the same spam message
delivered through many different network blocks. Also, they are getting
smarter about the network blocks they use - often choosing small blocks
interspersed with blocks allocated to legitimate systems... the result is
that it is becoming far more difficult to block networks without introducing
false positives... Network and IP blocking must become much more specific to
avoid collateral damage.

<snip>
]This reflects
]>what a "tuned" system's false positive rate can be.
]
]Yes, though their simplicity is part of their appeal, its simply a
]way to gather fresh samples and see if the 'old' tests still work.
] My next step in the battle against FPs will be having 2 Declude
]servers, one to build new tests on and weed out FPs using domains
]that can afford them, then another with domains that get less
]monitoring and need a higher level of care.  4 per week is
]amazingly low, but clearly shows what you mean by tuned, may I ask
]how many months/years it took to get there?

It appears that most systems can achieve nearly this level in only a few
weeks. Our filter base was in development for about 2 years before we began
deploying Message Sniffer.

<snip>
]>The chief error in this metric is that there is no control on how
]many false
]>positives occurr that may not be reported.
]
]How easy is it for your customers to monitor/review what gets caught?

That's really up to them. Message Sniffer only tags the message. The actions
they take after that are up to them... If they place the messages in a
holding bin and then review them there is a good chance they will see any
false positives - as long as the volume on their system will allow it.

Another way to do it is to mark messages as "suspected spam" and allow the
end users to search for false positives... Hopefully the end users can also
be encouraged to report the false positives they find... (often they do so
with some "intensity" and without prompting)

The challenge is that it takes a human being to do this kind of checking and
so it is expensive. Even if the technology were perfect the specifications
change with time and location... different users, different systems/locals,
and different times constantly change the definition of spam/not-spam from
each individual's perspective. Our goal is to leverage any overlap that
exists in this definition while allowing for all of the differences.
Ultimately it takes the people on the receiving end to make the distinction.

A final thought about false positives - If there is a false positive and
nobody ever notices then is that really a false positive? There is a famous
tactic of leaders and administrators who regularly clear their desk by
dumping everything in the trash without review. The rationalle is that if
there's an important message there it will resurface... I wonder if that is
one reason we have so few reports of false positives??

]>Currently, the
]>one-size-fits-all system is designed not to be too strict for small ISP's
]>while still being strict enough for most small corporate offices.
]
]Do you see then, a difference between the kind of mail small
]business get and the kind individual ISP customers get?  If so,
]can the difference be defined, the nature of use or type of email received?

What we see is that it tends to fall into soft groups - but it is nearly
impossible to predict which groups apply to any particular user or system.
The fact that there are some groupings and categories means that there is an
opportunity to leverage that filtering effort over a wider popuplation and
gain some advantage.

What we also see is that the groupings are not necessarily formalizable -
with a few exceptions here and there... For example, you can generally
create a group for Pornography that most folks will agree upon, but then
there are other groups - such as financial advice - that don't resolve with
any real precision.

This is why we're applying AI techniques to the process. Message Sniffer is
only one component of an intelligent, evolving system. This system is being
designed and built to "learn" where these groupings are, and then to adapt
these categories to individual subscriber's needs. The system is also
designed to adapat over time as these gropings change and evolve. It's
important, after all, to recognize that the filtering process itself will
have a systemic impact on the character of the messages being sent and
filtered.

]>In the mean time Declude offers all of the additional flexibility required
]>to tune this model for each local system.... Declude is by far
]the most flexible we've seen
]
]Agreed, and it doesn't hurt that its creator is ready, able, and
]willing to adapt it as issues arrise.  Does your comment (tune
]this model) indicate you run seperate configurations for different
]types or styles of customers?

Yes. For our own customers we adapt - usually with Declude. In some cases we
even develop separate rule bases for Message Sniffer. We use whatever level
of specialization is required for each user group.

I know that many of our Message Sniffer users also adapt the system to their
end users needs.

I hope these inisghts are helpful.
Thanks!
_M

---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

RE: [Declude.JunkMail] Effectiveness

Reply via email to