George,

That's good data to have. I would have to assume that something tagged as gibberish in the main test would be random, and that's fairly well indicated by the somewhat tight range of the two character strings. Unless you are using a logging feature that I'm not aware of, you are only showing the last hit that the filter produces, and that explains why the Z strings are mostly bunched at the top. I've got these ordered alphabetically and will probably leave them there for management purposes.

The counterbalances though are definitely something that I will use your information for reordering them. I believe I made an attempt to order these in the 2.0 filter version according to what I thought would be more common as well as what would be a faster search (BODY searches are slower than other things and will go lower in general, though a BODY search for base64 goes at the top because it is fairly common). Because of this and along with the above mentioned issue, the hit stats therefore aren't a perfect indication of what would save the most processing power, but it definitely helps if you just make some assumptions. I hadn't gathered any stats myself on the Auto-generated Codes that I added in about a month or so ago, and it's nice to see that they're getting hit since I was really just brainstorming about what types of things might be seen. I might remove some entries though if they aren't showing being hit since they are BODY searches and expensive. I'll probably still leave that list of Auto-generated Codes in alphabetical order though for management purposes. This shouldn't make a big difference considering that the most common one only gets hit about 1-3% of the time (don't know how common the filter fails a later line which ends up getting logged instead).

If Declude did log every line that hits in a filter, you would see things like GIBBERISH hitting some attachments thousands of times per message, and I don't think that's worth the trouble. Data like this will make a much bigger impact on performance if you run it against filters where hits can only occur once in a file due to unique data or exact matching. Kami has a bunch of those.

Thanks,

Matt



George Kulman wrote:

Matt,

I thought you might be interested in the attached data which analyzes the
GIBBERISH and ANTI-GIBBERISH filters by number of hits on my system from
11/15 through yesterday.

If you're looking for "effectiveness" you should set the entries in
descending order of probability.  I use a variation which looks at date of
most recent hit as well as hit count, although that's more important with
filters that are being modified on a continual rather that a fairly static
filter such as these two.

George



-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Matthew Bramble
Sent: Monday, December 22, 2003 9:52 AM
To: [EMAIL PROTECTED]
Subject: [Declude.JunkMail] GIBBERISH 2.0.1, single file filter with END functionality.



I've made some huge leaps forward recently in terms of the processing power required to run Declude with the custom filters that I have installed. This was done by way of the SKIPIFWEIGHT functionality introduced in the latest beta, but also by way of re-ordering my filters in the Global.cfg file so that the easiest to process custom filters are run first in the hopes of avoiding the need to run more costly ones.


This new version of GIBBERISH makes use of functionality introduced in the 1.77 beta, however the most recent interim release, 1.77i7, should be used in order to guarantee proper operation (initial versions would always end processing, and effectively disabled the filters). The END functionality removes the need to have ANTI filters since the filter can be stopped before it gets to the main filter matches, and it also presents another opportunity to save on the processing power required to run such things. This also makes use of the MAXWEIGHT functionality to limit the max score as well as end processing once a single hit has been scored. Note that the filter will only log (at the LOW setting) and show WARN actions when the filter is tripped and an END was not hit...which is great! No more looking at non-scoring custom filters due to counterbalances :D

Please read through the file and follow these instructions if you already have GIBBERISH installed:

1) Comment out the ANTI-GIBBERISH custom filter in your Global.cfg
2) Change the score of the GIBBERISH filter to 0 in your Global.cfg.
3) Change the scoring of the filter to match your system (it is scored by default for base 10 systems). This can be done
by changing the MAXWEIGHT and Main Filter lines to reflect the multiple of 10 that your system is based on.
4) Change the SKIPIFWEIGHT score to reflect your delete weight, or whatever weight you would like for the filter to
be skipped if the system has already reached it before processing the filter.


The file can be downloaded from the following location:



http://www.mailpure.com/software/decludefilters/gibberish/Gibberish_v2-0-1.z
ip

Please report any issues with the new filter format. As soon as bugs stop being reported, I will move to convert the other dual file filters into single file alternatives which make use of the END functionality. Until the functionality goes into a full release, I'm going to continue to primarily provide the old style filters on my site.

Matt




---
[This E-mail was scanned for viruses by Declude Virus (http://www.declude.com)]

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type "unsubscribe Declude.JunkMail".  The archives can be found
at http://www.mail-archive.com.

Reply via email to