Re: [sniffer] mini-obfuscation

2005-03-23 Thread Darrell (supp...@invariantsystems.com)
Pete, 

Doesnt Sniffer have a certain level of support for regex's?  I know we have 
had good luck with regex's like this which catch obfuscation techniques with 
viagra with Declude.  We found it easier to use regex's than to list all of 
the different variations. 

(?:\b|\s)[_\W]{0,3}(?:\\\/|V)[_\W]{0,3}[ij1!|l\xEC\xED\xEE\xEF][_\W]{0,3}[a4 
[EMAIL PROTECTED],3}[xyz]?[gj][_\W]{0,3}r[_\W]{0,[EMAIL PROTECTED], 
3}x?[_\W]{0,3}(?:\b|\s) 

Darrell

Check out http://www.invariantsystems.com for utilities for Declude And 
Imail.  IMail/Declude Overflow Queue Monitoring, SURBL/URI integration, MRTG 
Integration, and Log Parsers. 

Pete McNeil writes: 

On Tuesday, March 22, 2005, 8:31:07 PM, Andrew wrote: 

snip/ 

CA How many times have we all been frustrated that a piece of spam ending
CA up in *OUR* mailbox that was s close in content to spam we whacked
CA yesterday? 

CA I thought the top n obfuscations might be interesting to look at, and
CA perhaps a shortcut  (temporary, albeit) for spam catching.  I thought we
CA might see whether, for example, broken URLs, fake comments, or high-bit
CA ASCII character substitutions were the obfuscation technique du jour. 

Here you hit it IMHO. The reality appears to be, from my experience,
that small domains of obfuscation patterns rise and fall like swells
on the ocean. That is, stability tends to arise in one domain of
message characteristics and then fall to rise in another domain.
Sometimes the domain is well understood and sometimes it is entirely
new and forces us to think differently about what a feature really
is. 

By domain I mean things like message structure, word obfuscation
techniques, phrase based swapping, html exploitation, etc... 

The du jour part of your statement is a key element to the problem.
Defining and re-defining the conceptual framework that describes
feature domains in the spam is the other key element. 

Put more simply - knowing what to look for is a basic element, but it
gets you nowhere on it's own. Knowing (recognizing) when to look for
the what is the key that makes the problem workable. 

CA I while back curiousity got the better of me (it was raining, and
CA I had a few days off) and I did a few grep sweeps on a warm spam
CA corpus. 

CA I was disappointed in my success rate for: 

CA v.?i.?a.?g.?r.?a.? 

CA and similar queries with deliberately substitutions (e.g. using a 1
CA for i).  I started writing a grep-generating-permutation engine and
CA decided my time was better spent on scritching my cat under his chin. 

That is a nifty direction that I wish I had more time for. Perhaps I
will some day soon when Sniffer get's slashdotted and sales go through
the roof! 

--- meantime, back on this planet, I suggested a very similar thing to
Paul Graham at the first spam conference at MIT. As I recall he said
it was ambitious - a description that I have learned has a special
meaning in scientific circles. Something having to do with avian swine
and snowballs that have successful careers as tour guides in hell. 

One of these days I think I might do it anyway, just to prove the
point, but in the mean time I too prefer to spend more time with my
cat. ;-) 

Don't get me wrong - I strongly believe it can be done this way, but
it requires much more than good technology. It runs right into one of
the biggest problems with AI and, perhaps more importantly, people's
expectations of AI. No matter how good the pattern learning system
might be it will always lack the human experience. Computers don't
date or gain weight - so they have a hard time understanding what much
of the spam is about simply by looking at the patterns. That's why
the Message Sniffer process is designed with people tightly integrated
into the system. 

_M 

 

This E-Mail came from the Message Sniffer mailing list. For information and (un)subscription instructions go to http://www.sortmonster.com/MessageSniffer/Help/Help.html

This E-Mail came from the Message Sniffer mailing list. For information and (un)subscription instructions go to http://www.sortmonster.com/MessageSniffer/Help/Help.html


Re: [sniffer] mini-obfuscation

2005-03-22 Thread Pete McNeil
On Tuesday, March 22, 2005, 8:31:07 PM, Andrew wrote:

snip/

CA How many times have we all been frustrated that a piece of spam ending
CA up in *OUR* mailbox that was s close in content to spam we whacked
CA yesterday?

CA I thought the top n obfuscations might be interesting to look at, and
CA perhaps a shortcut  (temporary, albeit) for spam catching.  I thought we
CA might see whether, for example, broken URLs, fake comments, or high-bit
CA ASCII character substitutions were the obfuscation technique du jour.

Here you hit it IMHO. The reality appears to be, from my experience,
that small domains of obfuscation patterns rise and fall like swells
on the ocean. That is, stability tends to arise in one domain of
message characteristics and then fall to rise in another domain.
Sometimes the domain is well understood and sometimes it is entirely
new and forces us to think differently about what a feature really
is.

By domain I mean things like message structure, word obfuscation
techniques, phrase based swapping, html exploitation, etc...

The du jour part of your statement is a key element to the problem.
Defining and re-defining the conceptual framework that describes
feature domains in the spam is the other key element.

Put more simply - knowing what to look for is a basic element, but it
gets you nowhere on it's own. Knowing (recognizing) when to look for
the what is the key that makes the problem workable.

CA I while back curiousity got the better of me (it was raining, and
CA I had a few days off) and I did a few grep sweeps on a warm spam
CA corpus.

CA I was disappointed in my success rate for:

CA v.?i.?a.?g.?r.?a.?

CA and similar queries with deliberately substitutions (e.g. using a 1
CA for i).  I started writing a grep-generating-permutation engine and
CA decided my time was better spent on scritching my cat under his chin.

That is a nifty direction that I wish I had more time for. Perhaps I
will some day soon when Sniffer get's slashdotted and sales go through
the roof!

--- meantime, back on this planet, I suggested a very similar thing to
Paul Graham at the first spam conference at MIT. As I recall he said
it was ambitious - a description that I have learned has a special
meaning in scientific circles. Something having to do with avian swine
and snowballs that have successful careers as tour guides in hell.

One of these days I think I might do it anyway, just to prove the
point, but in the mean time I too prefer to spend more time with my
cat. ;-)

Don't get me wrong - I strongly believe it can be done this way, but
it requires much more than good technology. It runs right into one of
the biggest problems with AI and, perhaps more importantly, people's
expectations of AI. No matter how good the pattern learning system
might be it will always lack the human experience. Computers don't
date or gain weight - so they have a hard time understanding what much
of the spam is about simply by looking at the patterns. That's why
the Message Sniffer process is designed with people tightly integrated
into the system.

_M




This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html