> This was an old, old feature request/bug fix from back in the 
> Scott days, where it was desired not include encoded base64 

I requested this as a change long ago for two reasons:

1) To avoid false positives where search text matches the MIME or UUENCODE 
formatting

2) To provide an instant speed up in BODY and ANYWHERE processing because 
Declude has less text to match, in particular when MIME encoding text is being 
searched for, say, an encoded PDF, DOC or JPG.

It may also have the additional benefit of being more accurate:

3) To provide for fewer false negatives, because the string size is more 
complete with the body text.

I don't know how it was truly programmed, but the operational explanation from 
Scott years ago, Declude decodes the message and strips various formattings, 
concatenates it all into a very large string, and that is what the BODY and 
ANYWHERE filters search against.

This lets Declude do a BODY match where the text is obfuscated inside of HTML, 
because the HTML tags are stripped, and likewise, should catch a phrase which 
is split by a linefeed.

I recognized that this was a major coding change, but I thought it would be 
beneficial for power users to specify the "layer" at which the text searching 
is done, e.g.

Message        (Original message format with all the warts)
MessageFixed   (Illegal characters stripped and line formats fixed)
MessageDecoded (MIME and UUENCODE converted back to 8 bit ASCII)
Text           (Only the text attachments specified, not graphics
                and not documents or other binary attachments)
TextStripped   (HTML stripped out, white space collapsed)

I've removed HTML deobfuscation as a layer to this onion, as that is too 
specfic of a spammer technique, and is adequately covered by creative PCRE if 
the last two text layers are available.

The MessageDecoded layer might is probably sufficiently represented by just the 
bones of the message, the text that makes up the framework of the message such 
as the header lines and the MIME Content-Type and boundary lines, without the 
actual text contents and without the attachments.

In the many years that I've used Declude (and been preceeded by power users 
such as Sandy, Matt, and John [and superseded by Scott]) nobody has ever wanted 
to match text against the representation of an attachment, e.g. to match text 
against the representation of an executable, a specific virus, or the header of 
a TIFF file.

Andrew.



> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On 
> Behalf Of Matt
> Sent: Wednesday, March 14, 2007 9:21 AM
> To: declude.junkmail@declude.com
> Subject: Re: [Declude.JunkMail] PCRE FILTERING
> 
> Dave,
> 
> This was an old, old feature request/bug fix from back in the 
> Scott days, where it was desired not include encoded base64 
> content on BODY searches (decoded content was desired).  The 
> work around for this it to add a separator to the end of the 
> filter such as a period, comma, space, tab, or left HTML bracket.
> 
> It would also help to specify what format the BODY data would 
> come in, for instance is a line break in the original 
> processed by the regular expression as a line break?  It 
> would be hugely beneficial to regular expressions to take the 
> BODY content and strip out all line breaks, replacing them 
> with spaces for the purpose of filtering with regex.  
> Maybe it is time to create another variable for body content 
> that is more regex friendly?  That should be easy enough to do.
> 
> Matt
> 
> 
> 
> David Barker wrote:
> > We can certainly look at doing something like that, 
> currently I am using
> > this line:
> >
> > BODY        END     CONTAINS
> Content-Transfer-Encoding: base64
> >
> > David 
> >
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On 
> Behalf Of Scott
> > Fisher
> > Sent: Wednesday, March 14, 2007 10:15 AM
> > To: declude.junkmail@declude.com
> > Subject: Re: [Declude.JunkMail] PCRE FILTERING
> >
> > I'm seeing hits in the attachments too.
> > Triggered ANYWHERE PCRE filter REGEX-KEYWORDS : 
> vHXAH51eG1ujzM   (valium)
> >
> > It would be real nice to be able to search the body without 
> the attachments
> > like this.
> > BODYONLY 25  PCRE
> > (?i:v.{0,[EMAIL PROTECTED],2}[\|li1í\!].{0,2}[\|i1í\!].{0,2}[vu].{0,2}m)
> >
> > Being able to search the body without the attachments would 
> also be a time
> > saver on those BODY filters.
> >
> >
> >
> > ----- Original Message ----- 
> > From: "David Barker" <[EMAIL PROTECTED]>
> > To: <declude.junkmail@declude.com>
> > Sent: Tuesday, March 13, 2007 11:24 AM
> > Subject: [Declude.JunkMail] PCRE FILTERING
> >
> >
> > Wanted to give a sample of how the new Regular Expressions 
> are identifying
> > patterns, here is a log snip on a few patterns for Drugs:
> >
> > ANYWHERE PCRE filter FILTER-DRUGS : C1al.is [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : C1alis is [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : [EMAIL PROTECTED] [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Cia1is s [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Cial1s S [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Cialiis [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : CIALIS [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Cialis S [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : H,G,H [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : HGH [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Human Growth Hormone 
> [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : HxGxH [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : [EMAIL PROTECTED] [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Leviitra [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Levitra [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Levitra a [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Levltra [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : v!Agr@ a [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : V_I_A_G_R_A [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : v|aGR@ [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : V1agr@ [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : V1agra [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Val1um [weight -> 1]
> > ANYWHERE PCRE filter FILTER-DRUGS : [EMAIL PROTECTED]@ [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Vi[agra [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Via gra [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Viagr@ a [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Viagra [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Viagra a [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Viagraa [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : VlAGR@ [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : VlAGRA [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Xanax [weight -> 5]
> > ANYWHERE PCRE filter FILTER-DRUGS : Xanaxx [weight -> 5]
> >
> > These are the expressions I am using - as I am still on a 
> learning curve
> > these expressions may be improved and become more accurate 
> While testing I
> > score relatively low just in case of FP's. I use a tool 
> called baregrep
> > http://www.baremetalsoft.com/baregrep/ which speeds through 
> huge DEBUG logs
> > pulling out entries I am looking for. Hope this helps get 
> you started with
> > PCRE, I think the Declude community can recieve great value 
> from sharing
> > this type of info.
> >
> > #CIALIS
> > ANYWHERE 3 PCRE
> > 
> (?i:\bc.{0,2}[\|li1í\!].{0,[EMAIL PROTECTED],2}[\|li1í\!].{0,2}[\|i1í\
> !].{0,2}s)
> >
> > #HGH
> > ANYWHERE 5 PCRE (?i:\b(?:human growth
> > hormone|(?-i:HGH)|H.G.H)\b)
> >
> > #LEVITRA
> > ANYWHERE 5 PCRE
> > (?i:\bl.{0,2}e.{0,2}v.{0,2}[\|li1í\!].{0,2}t.{0,2}r.{0,[EMAIL PROTECTED])
> >
> > #VIAGRA
> > ANYWHERE 5 PCRE
> > (?i:v.{0,2}[\|li1í\!].{0,[EMAIL PROTECTED],2}g.{0,2}r.{0,[EMAIL PROTECTED])
> >
> > #XANAX
> > ANYWHERE 5 PCRE (?i:x.{0,[EMAIL PROTECTED],2}n.{0,[EMAIL PROTECTED],2}x)
> >
> > David Barker
> > Director of Product Management
> > Your Email security is our business
> > 978.499.2933 office
> > 978.988.1311 fax
> > [EMAIL PROTECTED]
> >
> >
> >
> > ---
> > This E-mail came from the Declude.JunkMail mailing list.  To
> > unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
> > type "unsubscribe Declude.JunkMail".  The archives can be found
> > at http://www.mail-archive.com.
> >
> >
> >
> >
> > ---
> > This E-mail came from the Declude.JunkMail mailing list.  To
> > unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
> > type "unsubscribe Declude.JunkMail".  The archives can be found
> > at http://www.mail-archive.com.
> >
> >
> >
> > ---
> > This E-mail came from the Declude.JunkMail mailing list.  To
> > unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
> > type "unsubscribe Declude.JunkMail".  The archives can be found
> > at http://www.mail-archive.com.
> >
> >
> >
> >   
> 
> 
> ---
> This E-mail came from the Declude.JunkMail mailing list.  To
> unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
> type "unsubscribe Declude.JunkMail".  The archives can be found
> at http://www.mail-archive.com.
> 
> 


---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Reply via email to