Re: [sniffer] Possible blip?

2004-05-21 Thread Scott Fisher
2 thoughts from me:

1. Right on on the Nigerian scams, possible keeping these rules longer. As I was 
forwarding out a Nigerian scam to the spam mailbox, I too wondered how long the 
Nigerian rules were kept in play. I might also add Nigeria's twin sister the 
International Lottery spam and Stock Spams might also be kept longer. I noticed an 
increase in the Stock spams this week. 

2. I've been tracking different character sets for a couple of weeks, the Chinese, 
Cyrillic and Korean look promising. I get false hits on Greek, Thai, and Vietnamese 
Headers.

Scott Fisher
Director of IT
Farm Progress Companies

 [EMAIL PROTECTED] 05/21/04 12:42PM 
Pete,

Our Hold range has returned to more normal territory on Thursday.  
Here's the stats from the week as a whole on what has been very 
consistent traffic.  Out of all E-mail processed, both good and bad, the 
%Hold represents what scored between 10-24 points on our system and 
needed review, the %Sniffer represents all Sniffer hits except for Gray, 
the %Spam is what we scanned and didn't deliver (generally about 99.8% 
of spam is caught at a score of 10 which this is based on), and the 
Sniffer/Spam is the percentage of Sniffer hits as a portion of messages 
scoring 10 or more.

Day  %Hold%Sniffer%SpamSniffer/Spam
Mon: 1.86% 77.27% 80.37% 96.14%
Tue: 2.83% 74.53% 79.37% 93.39%
Wed: 2.13% 77.60% 79.66% 97.41%
Thur:1.95% 76.50% 80.66% 94.84%

The only change that we made to our system was to add two smaller 
domains later in the week, and we introduced filters for Cyrillic and 
Chinese languages on Wednesday morning which have cut our hold file down 
by 0.38 percentage points on Thursday, which explains how our %Hold is 
lower on than on Wednesday with a lower Sniffer hit rate on spam.

I did note two high volume untagged static spammers on Tuesday that we 
blacklisted locally, and that combined with the increase in Sniffer 
change rates (spam storm) might account for the changes that I saw.  I 
am wondering though about the recommendations that you have made for 
possibly fine tuning our rule base.  Again though, please keep in mind 
that I still feel that performance is overall very, very good.

One of my thoughts regarding minimum rule strengths and grace periods is 
that all groups aren't necessarily the same.  For instance Nigerian 
scams are low volume and sporadic, and my system performs the worst on 
these things.  Maybe lower rule strengths and longer grace periods makes 
much more sense for the Phishing category than it does for many other 
categories for instance.  Is that possible?

I also looked up the rule strengths on your site and found that about 
50%, or maybe more, have a strength below 1, and maybe lowering that is 
worth testing out so long as I don't massively increase the number of 
records.  I do think though that I would like to test out extending the 
grace period.  Most of my false positives are not on things that this 
would affect, and that might give niche sources a little extra coverage 
if I understand things correctly.

I'll follow your directions and contact you directly regarding any 
affirmative changes, but I thought it might be beneficial to keep this 
discussion public since some other stats hounds might find this 
information to be of use :)

If you can glean anything from the numbers that I gave you, please add 
your thoughts.

Thanks,

Matt





Pete McNeil wrote:

 At 05:00 PM 5/19/2004, you wrote:

 snip/

 I haven't yet upgraded to the most recent release, I'm still on the 
 prior beta.  I'll probably do that this evening.  I tend to wait on 
 upgrades until there has been enough time for bugs to surface unless 
 I am already looking for a fix.  I'm sure that the extra verification 
 of the rulebase will help prevent the potential of problems, and I 
 guess this has the possibility of being caused by a bit of corrupted 
 data, though that's probably reaching.


 There were no substantive changes from the beta to the production 
 version. Largely just a removal of monitoring code.

 Again, regardless if there was a blip, Sniffer still does a wonderful 
 job of tagging lots and lots of E-mail, just not quite as much as the 
 day before.


 Last night I was able to adjust the rule strength analysis window back 
 to it's original settings. About 5 days of data were lost - but those 
 days will be recovered quickly. Please let me know if this adjustment 
 improved your conditions.

 I've noted that on a number of other lists there seem to be posts 
 about a sudden increase in spam over the past few days. We are 
 definitely seeing this also - approximately a 25% or more increase in 
 new rule additions in the past 4 days:

 http://www.sortmonster.com/MessageSniffer/Performance/ChangeRates.jsp 

 Specifically note from about 4 days ago...


Days Ago Adjustments
 ---

0356
1508
2391

Re: [sniffer] Possible blip?

2004-05-21 Thread Matt




Scott,

Regarding my Cyrillic and Chinese filters, I did a review of a full
week's held spam, looking for foreign languages and patterns to tag. I
found from other research that the primary Chinese characterset,
GB2312, contains the Western Latin characterset, and so someone could
send an E-mail with this characterset defined and still have English as
the message. Because of this I do more than just look for the
offending characterset, I've built a combo filter that looks for both
high bit characters such as  as well as body or header hits for
encoding of GB2312 (Chinese/Korean) or Windows-1251 (Cyrillic). I also
have Declude END statements for appearances of US-ASCII and ISO-8859-1,
so messages like this one that are referencing such patterns won't trip
the filter. It seems to be stopping about 80% to 90% of the stuff, but
I'm guessing that the stuff that is getting through didn't hit one of
the high bit characters in my filter and I might need to simply expand
my list a bit. Unfortunately I have no idea what characters are most
common, so I'm just eyeballing it from sources.

I had one false positive on a Yahoo Groups posting that referenced
163.com, a Chinese free Web mail provider that inserts Chinese language
footers. The message was in English, but encoded in GB2312 and didn't
indicate any sign of English besides the actual text. Because of this,
I might throw in an exception for the word "the " (followed by a space)
just as a test to see if text in English is present, but I have to
review that. This message was also BASE64 encoded and that might be an
appropriate exception??? The last pattern that I might look at is
using the new MailPolice test for identifying Web-mail providers, and
excepting them from the filter because they have issues with encoding
languages I've found.

Hope this helps.

Matt



Scott Fisher wrote:

  2 thoughts from me:

1. Right on on the Nigerian scams, possible keeping these rules longer. As I was forwarding out a Nigerian scam to the spam mailbox, I too wondered how long the Nigerian rules were kept in play. I might also add Nigeria's twin sister the International Lottery spam and Stock Spams might also be kept longer. I noticed an increase in the Stock spams this week. 

2. I've been tracking different character sets for a couple of weeks, the Chinese, Cyrillic and Korean look promising. I get false hits on Greek, Thai, and Vietnamese Headers.

Scott Fisher
Director of IT
Farm Progress Companies

  
  

  
[EMAIL PROTECTED] 05/21/04 12:42PM 

  

  
  Pete,

Our Hold range has returned to more normal territory on Thursday.  
Here's the stats from the week as a whole on what has been very 
consistent traffic.  Out of all E-mail processed, both good and bad, the 
%Hold represents what scored between 10-24 points on our system and 
needed review, the %Sniffer represents all Sniffer hits except for Gray, 
the %Spam is what we scanned and didn't deliver (generally about 99.8% 
of spam is caught at a score of 10 which this is based on), and the 
Sniffer/Spam is the percentage of Sniffer hits as a portion of messages 
scoring 10 or more.

Day  %Hold%Sniffer%SpamSniffer/Spam
Mon: 1.86% 77.27% 80.37% 96.14%
Tue: 2.83% 74.53% 79.37% 93.39%
Wed: 2.13% 77.60% 79.66% 97.41%
Thur:1.95% 76.50% 80.66% 94.84%

The only change that we made to our system was to add two smaller 
domains later in the week, and we introduced filters for Cyrillic and 
Chinese languages on Wednesday morning which have cut our hold file down 
by 0.38 percentage points on Thursday, which explains how our %Hold is 
lower on than on Wednesday with a lower Sniffer hit rate on spam.

I did note two high volume untagged static spammers on Tuesday that we 
blacklisted locally, and that combined with the increase in Sniffer 
change rates (spam storm) might account for the changes that I saw.  I 
am wondering though about the recommendations that you have made for 
possibly fine tuning our rule base.  Again though, please keep in mind 
that I still feel that performance is overall very, very good.

One of my thoughts regarding minimum rule strengths and grace periods is 
that all groups aren't necessarily the same.  For instance Nigerian 
scams are low volume and sporadic, and my system performs the worst on 
these things.  Maybe lower rule strengths and longer grace periods makes 
much more sense for the Phishing category than it does for many other 
categories for instance.  Is that possible?

I also looked up the rule strengths on your site and found that about 
50%, or maybe more, have a strength below 1, and maybe lowering that is 
worth testing out so long as I don't massively increase the number of 
records.  I do think though that I would like to test out extending the 
grace period.  Most of my false positives are not on things that this 
would affect, and that might give niche 

Re: [sniffer] Possible blip?

2004-05-21 Thread Scott Fisher
Interesting.

Are you searching for 2 character pairs with GB2312?

Scott Fisher
Director of IT
Farm Progress Companies

 [EMAIL PROTECTED] 05/21/04 01:46PM 
Scott,

Regarding my Cyrillic and Chinese filters, I did a review of a full 
week's held spam, looking for foreign languages and patterns to tag.  I 
found from other research that the primary Chinese characterset, GB2312, 
contains the Western Latin characterset, and so someone could send an 
E-mail with this characterset defined and still have English as the 
message.  Because of this I do more than just look for the offending 
characterset, I've built a combo filter that looks for both high bit 
characters such as ¥ as well as body or header hits for encoding of 
GB2312 (Chinese/Korean) or Windows-1251 (Cyrillic).  I also have Declude 
END statements for appearances of US-ASCII and ISO-8859-1, so messages 
like this one that are referencing such patterns won't trip the filter.  
It seems to be stopping about 80% to 90% of the stuff, but I'm guessing 
that the stuff that is getting through didn't hit one of the high bit 
characters in my filter and I might need to simply expand my list a 
bit.  Unfortunately I have no idea what characters are most common, so 
I'm just eyeballing it from sources.

I had one false positive on a Yahoo Groups posting that referenced 
163.com, a Chinese free Web mail provider that inserts Chinese language 
footers.  The message was in English, but encoded in GB2312 and didn't 
indicate any sign of English besides the actual text.  Because of this, 
I might throw in an exception for the word the  (followed by a space) 
just as a test to see if text in English is present, but I have to 
review that.  This message was also BASE64 encoded and that might be an 
appropriate exception???  The last pattern that I might look at is using 
the new MailPolice test for identifying Web-mail providers, and 
excepting them from the filter because they have issues with encoding 
languages I've found.

Hope this helps.

Matt



Scott Fisher wrote:

2 thoughts from me:

1. Right on on the Nigerian scams, possible keeping these rules longer. As I was 
forwarding out a Nigerian scam to the spam mailbox, I too wondered how long the 
Nigerian rules were kept in play. I might also add Nigeria's twin sister the 
International Lottery spam and Stock Spams might also be kept longer. I noticed an 
increase in the Stock spams this week. 

2. I've been tracking different character sets for a couple of weeks, the Chinese, 
Cyrillic and Korean look promising. I get false hits on Greek, Thai, and Vietnamese 
Headers.

Scott Fisher
Director of IT
Farm Progress Companies

  

[EMAIL PROTECTED] 05/21/04 12:42PM 


Pete,

Our Hold range has returned to more normal territory on Thursday.  
Here's the stats from the week as a whole on what has been very 
consistent traffic.  Out of all E-mail processed, both good and bad, the 
%Hold represents what scored between 10-24 points on our system and 
needed review, the %Sniffer represents all Sniffer hits except for Gray, 
the %Spam is what we scanned and didn't deliver (generally about 99.8% 
of spam is caught at a score of 10 which this is based on), and the 
Sniffer/Spam is the percentage of Sniffer hits as a portion of messages 
scoring 10 or more.

Day  %Hold%Sniffer%SpamSniffer/Spam
Mon: 1.86% 77.27% 80.37% 96.14%
Tue: 2.83% 74.53% 79.37% 93.39%
Wed: 2.13% 77.60% 79.66% 97.41%
Thur:1.95% 76.50% 80.66% 94.84%

The only change that we made to our system was to add two smaller 
domains later in the week, and we introduced filters for Cyrillic and 
Chinese languages on Wednesday morning which have cut our hold file down 
by 0.38 percentage points on Thursday, which explains how our %Hold is 
lower on than on Wednesday with a lower Sniffer hit rate on spam.

I did note two high volume untagged static spammers on Tuesday that we 
blacklisted locally, and that combined with the increase in Sniffer 
change rates (spam storm) might account for the changes that I saw.  I 
am wondering though about the recommendations that you have made for 
possibly fine tuning our rule base.  Again though, please keep in mind 
that I still feel that performance is overall very, very good.

One of my thoughts regarding minimum rule strengths and grace periods is 
that all groups aren't necessarily the same.  For instance Nigerian 
scams are low volume and sporadic, and my system performs the worst on 
these things.  Maybe lower rule strengths and longer grace periods makes 
much more sense for the Phishing category than it does for many other 
categories for instance.  Is that possible?

I also looked up the rule strengths on your site and found that about 
50%, or maybe more, have a strength below 1, and maybe lowering that is 
worth testing out so long as I don't massively increase the number of 
records.  I do think though 

Re: [sniffer] Possible blip?

2004-05-21 Thread Pete McNeil


At 01:42 PM 5/21/2004, you wrote:
Pete,

Our Hold range has returned to more normal territory on Thursday.
Here's the stats from 
snip/
One of my thoughts regarding
minimum rule strengths and grace periods is that all groups aren't
necessarily the same. For instance Nigerian scams are low volume
and sporadic, and my system performs the worst on these things.
Maybe lower rule strengths and longer grace periods makes much more sense
for the Phishing category than it does for many other categories for
instance. Is that possible?
These are definitely some things to look at - great food for new research
projects.
There is a great diversity - luckily the scanning engine has a huge
amount of headroom so most of the time we don't need to tune things very
precisely. In any of the categories you mention we see some rules die
immediately, and others seem to live on forever - often without a great
deal of reason for either case.
The fact that your hold range returned after we adjusted the rule
strength calculation window is a good indication that the relevant tuning
parameter is minimum rule strength. I noted that the previous adjustment
(changing the window from 45 to 35 days) happened precisely one month
ago. This strongly suggested that we were seeing a wave front
of sorts pass through the tuning system - so on a hunch I put it back to
45. Your report helps to support this conjecture. 
The grace period value has the greatest effect early on in a rule's life
cycle and probably shouldn't be extended beyond about 10 days. The design
of the grace period feature is that it gives a new rule time for it's
rule strength to rise to the minimum threshold. After that it's all about
the performance of the rule. This sets up a competitive environment in
the system. Reaching a threshold of 1.0 currently requires that at least
19 messages fail on that rule within the analysis window and on one of
the systems that are providing logs for analysis. With about 110 logs
being consistently reported there are plenty of chances for 19 hits to
happen. 
[ an ordinary reporting system processes about 1300
messages per hour with sniffer spending about 190ms of computing time per
message (or about 7% of the available computing time). In 5 days a rule
has about 1716 opportunities to kill a message. To stay
alive, a rule need only achieve a kill about .00011655% (one ten
thousandth of a percent) of the time. Of course, these numbers are a lot
like the average US family having 2.3 kids - ever seen .3 of a kid? ---
but the scale of the numbers seems right. ]
It could be argued that if a rule can't account for at least that
many hits across 110 systems in 5 days then it's not going to be
missed... The counter to this argument is that the spammers are driving
toward diversity to make filtering systems of all types difficult to
train and maintain -- as you noted, half of the active rules in the
default configuration are in this very low strength range.
I also looked up the rule
strengths on your site and found that about 50%, or maybe more, have a
strength below 1, and maybe lowering that is worth testing out so long as
I don't massively increase the number of records. I do think though
that I would like to test out extending the grace period. Most of
my false positives are not on things that this would affect, and that
might give niche sources a little extra coverage if I understand things
correctly.
Possibly - but I think an adjustment in the minimum rule strength will
probably suffice given the sensitivity at that range. For example, if you
adjust your minimum rule strength to 0.8 then on 10 credited kills would
be required over a period of 5 days on 110 systems in order to push the
rule above the strength threshold. Thereafter it would remain in place
for at least 45 days (with the current settings) --- each of those days
providing another opportunity to increase or maintain it's
strength...
There is also another mechanism at work here --- our core system scans
every presumed ham message one more time with every rule in the system
(min rule strength 0). The log from this scan is injected into the normal
analysis so that if a message matching a deactivated rule reaches our
system through any path the strength for that rule will be raised above
0.
The second stage of the reactivation process then kicks in because our
system normally scans messages with a minimum rule strength of 0.1 - so
any messages that were being missed will continue to rise in strength if
they are seen in any volume in our spam traps or submitted 
spam.
Once we see 20 instances every system will begin using the reactivated
rule... Some systems will begin even before that because they are using
more sensitive settings in their rulebases - this fact helps to
accelerate the process.
Anyway, a long story short - I think the first thing to try is adjusting
the Minimum Rule Strength. This is by far the most sensitive setting -
though the two do interact dynamically - 

RE: [sniffer] Possible blip?

2004-05-20 Thread Michiel Prins



Crew,

I reposrted this speed issue before, but despite very 
intensive debugging and testing, we have not found an external cause (meaning: 
not sniffer) for the following:

When I use sniffer without the persisten flag, I get 
this log:

h0t861s420040520214718md5845369.msg12516Clean000284440h0t861s420040520214718md5845370.msg11015Clean000274736h0t861s420040520214804md5845371.msg10916Match10940662439343h0t861s420040520214804md5845371.msg10916Match115560582286230743h0t861s420040520214804md5845371.msg10916Final115560580358043h0t861s420040520214825md5845372.msg11015Match29048522757278846h0t861s420040520214825md5845372.msg11015Match122523522930294246h0t861s420040520214825md5845372.msg11015Match122017522968297746h0t861s420040520214825md5845372.msg11015Match122016523346335546h0t861s420040520214825md5845372.msg11015Final29048520550446

which 
looks good (total execution time about 125ms)

When I 
have a persistent version running (max 50 ms polling time), I 
get:

h0t861s420040520214841md5845373.msg016Clean000359753h0t861s420040520214852md5845374.msg1631Match1193776268474138h0t861s420040520214852md5845374.msg1631Final119377620381038h0t861s420040520215115md5845375.msg031Match29081632413243244h0t861s420040520215115md5845375.msg031Final29081630945844h0t861s420040520215134md5845376.msg094Clean0002437042h0t861s420040520215320md5845377.msg4715Clean000194535
Which 
arevery good exec times (average45 ms). 

We 
have created our own program that does lots of spam checking for messages. At 
some point, it fires Sniffer. We log the time it takes for Sniffer to run, for 
statistical purposes. When sniffer is NOT persistent, I get the following log 
snippet (same messages as 1st sniffer log above, the second number after the 
.msg is the time it takes for sniffer to run):

0,"2004-05-20 
23:47:18",md5845369.msg,172,157,0,15,15,0,43406,20,"2004-05-20 
23:47:18",md5845370.msg,172,156,16,0,0,0,43309,20,"2004-05-20 
23:48:04",md5845371.msg,188,172,0,15,0,15,3578,10,"2004-05-20 
23:48:25",md5845372.msg,186,156,14,0,0,0,5572,1
Average time to run sniffer is 160 ms (sniffer said 125 ms). That means, 
sniffer can't report about 35 ms which isnormalfor application 
startup and shutdown (also the log is written _after_ the exec time calculation 
has been made, file operations also take time).

But, 
now comes the big mystery: when persistent mode is ON, it takes a lot more time 
to execute (while max polling is only 50ms!)

0,"2004-05-20 
23:48:41",md5845373.msg,827,812,15,0,0,0,3607,10,"2004-05-20 
23:48:52",md5845374.msg,842,812,0,0,0,0,3833,10,"2004-05-20 
23:51:15",md5845375.msg,936,874,0,0,0,0,9560,10,"2004-05-20 
23:51:35",md5845376.msg,889,859,15,0,0,0,26387,00,"2004-05-20 
23:53:21",md5845377.msg,937,922,0,15,0,15,1922,0

Which 
averages at 850 ms! While I expected 45 + 25 ms (to compensate for average 
waiting time) = 70 ms!

Pete, 
could you please check why this is happening (particularly in code OUTSIDE 
what's measured and logged)? I you can't find anything, I'll ask my collegue to 
come up with a timing program, which I would like to release on this list so 
other ppl can check how long it really takes to execute sniffer (measured from 
'the outside').

Regards,


ing. Michiel Prins
SOSSmallOffice 
Solutions/REJECT
Wannepad 27
1066 
HWAmsterdam
tel. 020-4082627
fax. 020-4082628

[EMAIL PROTECTED]



Spamvrijezakelijke 
e-mail?reject.nl!

Consultancy-Installation-Maintenance
Network Security 
- Project Management
SoftwareDevelopment 
-Internet- E-mail



Re: [sniffer] Possible blip?

2004-05-19 Thread Pete McNeil
At 12:57 PM 5/19/2004, you wrote:
Pete,
I noted late last night that my rulebase grew by 700 KB over the size of 
the previous one that was archived on my machine, and also the hits for 
some of the tests were noticeably lower and I had a definite increase in 
the number of messages that scored in my Hold range (instead of scoring 
higher and landing in Drop).  This morning though the size of my rulebase 
again dropped by about 450 KB.

I was just wondering if this might have been a hiccup with a bad 
compilation or maybe you were testing something out?
We didn't have anything under test that would alter the rulebases. I'm 
going to dig through the logs and see if there's anything I can identify.

If the rulebase was corrupted in any way you would have been able to detect 
that with the latest snf2check utility.

It's not unusual for ruelbase sizes to change by as much as 20%. The system 
is constantly activating and deactivating rules based on new log files that 
are reported. Currently a significant change might occur once per day - 
though we are working on new analysis engines that will permit more 
frequent rule strength adjustments.

For example, we might add 300-900 rules over the course of a day - then 
have that many (or more) removed when the new rule strength numbers are 
calculated.

Another factor that impacts rulebase size is the content of the rules. The 
folding process is not deterministic so it is possible for a few rule 
changes to significantly alter the way the rulebase file is folded. This is 
less likely to be the change but it is possible.

What was the date on the archive you used to compare sizes?
_M
This E-Mail came from the Message Sniffer mailing list. For information and (un)subscription instructions go to http://www.sortmonster.com/MessageSniffer/Help/Help.html


Re: [sniffer] Possible blip?

2004-05-19 Thread Matt
Pete,
I was judging based on the size of our Hold range which scores from 
10-24.  On Monday that was 1.86% of total traffic, but on Tuesday that 
was 2.83%.  Message volume was hardly different.  Other notables were 
that on Monday, Sniffer hit 77.27% of all E-mail but on Tuesday it hit 
74.53% (both exclude Gray hits).  Our overall spam percentage is about 
82% on Monday and 81% on Tuesday.  I did also see a drop in XBL hits 
which are primarily zombies from 38.14% to 34.93%.  I've always found 
static spammers to be much more problematic because they lack many 
spammy patterns, and it could be that there was a wave of them that came 
online yesterday which could account for the difference.

I don't want to make a huge deal out of this, but I noted the drop in 
size from one rulebase to another and thought that might be significant, 
and I like to be aware of what is going on.  In reality though the 
difference in percentages in our Hold file meant manually reviewing 50% 
more E-mails, or about 500 extra messages.  With everything else 
consistent, I figured it was worth a post just to check.

I do recall an old posting where you indicated that you were going to 
drop the expiration down to 5 days under a certain number of hits.  My 
thought there is that while it does present some savings in processing, 
it might make more sense to do a 7-8 day expiration in order to help 
catch spammers that are on weekly schedules, primarily lower volume 
niche spammers.  Unfortunately I can't compare my current results 
accurately to the pre-change data because the makeup of my traffic has 
changed significantly over that time frame.

Another possibility is that our Chinese language spam might have been 
extra heavy.  I've brought in much more of that recently from a couple 
different clients and it regularly scores low, probably because it's 
difficult to determine if most of it is spam.  I do know that Sniffer 
doesn't do nearly as well with this stuff.  I've noticed that these guys 
are spamming mostly during Chinese business hours, and they might have 
been extra light on Monday due to the lag in hours coming from a 
weekend.  If you are interested in getting these caught messages 
forwarded to you in an automated fashion for study or for potential 
inclusion, just let me know.  I also have a filter set up for Russian 
language E-mail, but it is not nearly as high in volume (now).

Regarding when I saw the changes in the rule base, I was pulling an 
all-nighter for server administration and noticed this around 5 a.m. 
when I ran the stats program on my Declude logs.  The renamed 'old' 
rulebase was just over 4 MB while the active one was 4.7 MB, then at 
about noon I noticed it was about 4.3 MB, and now it's back up over 4.7 
MB (1,000 KB = 1 MB in these stats if that matters).

I haven't yet upgraded to the most recent release, I'm still on the 
prior beta.  I'll probably do that this evening.  I tend to wait on 
upgrades until there has been enough time for bugs to surface unless I 
am already looking for a fix.  I'm sure that the extra verification of 
the rulebase will help prevent the potential of problems, and I guess 
this has the possibility of being caused by a bit of corrupted data, 
though that's probably reaching.

Again, regardless if there was a blip, Sniffer still does a wonderful 
job of tagging lots and lots of E-mail, just not quite as much as the 
day before.

Thanks,
Matt

Pete McNeil wrote:
At 12:57 PM 5/19/2004, you wrote:
Pete,
I noted late last night that my rulebase grew by 700 KB over the size 
of the previous one that was archived on my machine, and also the 
hits for some of the tests were noticeably lower and I had a definite 
increase in the number of messages that scored in my Hold range 
(instead of scoring higher and landing in Drop).  This morning though 
the size of my rulebase again dropped by about 450 KB.

I was just wondering if this might have been a hiccup with a bad 
compilation or maybe you were testing something out?

We didn't have anything under test that would alter the rulebases. I'm 
going to dig through the logs and see if there's anything I can identify.

If the rulebase was corrupted in any way you would have been able to 
detect that with the latest snf2check utility.

It's not unusual for ruelbase sizes to change by as much as 20%. The 
system is constantly activating and deactivating rules based on new 
log files that are reported. Currently a significant change might 
occur once per day - though we are working on new analysis engines 
that will permit more frequent rule strength adjustments.

For example, we might add 300-900 rules over the course of a day - 
then have that many (or more) removed when the new rule strength 
numbers are calculated.

Another factor that impacts rulebase size is the content of the rules. 
The folding process is not deterministic so it is possible for a few 
rule changes to significantly alter the way the rulebase file is