I do not understand the discussion !

There are all wishes build in (assp) except removing mails with the same 
subject - I do not love this idea, because the subject is ignored by 
rebuildspamdb - only the body is used and mails with the same body are 
ignored (except one) and will be deleted 60 days later .

-------------------------------------------
['MaintBayesCollection','Maintenance for Bayesian 
Collection',0,\&checkbox,'','(.*)',undef,
  'Set this to on, if you want ASSP to run a maintenance tasks on the 
bayesian collection folders ( spamlog , notspamlog , correctedspam , 
correctednotspam ). ASSP will delete the oldest files until the number of 
files per folder reaches MaxFiles. If you want ASSP to delete files 
because of their age instead of the number of files ( MaxFiles ), setup 
MaxBayesFileAge and/or MaxCorrectedDays to your needs.<br />
  This option is usefull, if UseSubjectsAsMaillogNames is set to on and 
doMove2Num is set to off, because in this case the number of files in 
every collection folder will grow 
infinite.',undef,undef,'msg006140','msg006141'],

['MaxBayesFileAge','Max Age of Bayes 
Files',10,\&textinput,0,'(\d+)',undef,
  'The maximum file age in days of every file in every bayesian collection 
folder ( spamlog , notspamlog ). If MaintBayesCollection is set to on and 
a file is older than this number in days, the file will be deleted. 
Default is 0. A value of 0 disables this feature and no file will be 
deleted because of its age.<br />
  <span class = "negative">Do not define this option, if you use the 
bayesian engine of ASSP. Deleting files because of there age, is wrong in 
this case!!!!!</span>',undef,undef,'msg006150','msg006151'],

['MaxCorrectedDays','Max Corrected File 
Age',5,\&textinput,'1000','(\d+)',undef,'This is the number of days a 
error report will be kept in the correctednotspam and correctedspam 
folders. These folders are the longterm memory of ASSP, therefore the 
default is 1000 days. ',undef,undef,'msg008590','msg008591'],

['MaxNoBayesFileAge','Max Age of non Bayes 
Files',10,\&textinput,0,'(\d+)',undef,
  'The maximum file age in days of every file in every non bayesian 
collection folder ( incomingOkMail , discarded , viruslog ). If defined 
and a file is older than this number in days, the file will be deleted. 
Default is 0. A value of 0 disables this feature and no file will be 
deleted because of its age.',undef,undef,'msg006160','msg006161'],
---------------------------------------------

If MaintBayesCollection is set to on -it is your choice to set the rest to 
your needs.

- MaxBayesFileAge/MaxNoBayesFileAge   ==   0       - reduce the number of 
files to maxfiles by deleting the oldest
- MaxBayesFileAge/MaxNoBayesFileAge   !=   0       - reduce the number of 
files by deleting all that are older than XX

-MaxCorrectedDays - this files should never be deleted (use 1000000)

And keep in mind - if the number of files per folder is reduced to 
maxfiles at 1:00 AM and rebuildspamdb is running at 11:00 PM - 
rebuildspamdb has to process possibly much more than maxfiles!

Currently there is a mistake in this maint-task: the files with the 
filedate set to 60 days in future, are the last files that will be deleted 
- this will be fixed in 4.14

Thomas






"GrayHat" <gray...@gmx.net> 
15.09.2009 18:35
Bitte antworten an
GrayHat <gray...@gmx.net>; Bitte antworten an
ASSP development mailing list <assp-test@lists.sourceforge.net>


An
"ASSP development mailing list" <assp-test@lists.sourceforge.net>
Kopie

Thema
Re: [Assp-test] Antwort: Re: Antwort: Re: Antwort:      Re:fixesandnewsin 
2.0.1_RC0.4.12






>> Hmm... that sounds like an idea which was brought on some 
>> time ago (John was still the dev for ASSP at the time); that 
>> is, set up some kind of TTL parameter for corpus files so 
>> that the spamdb rebuild should check the file date/time and 
>> if over the TTL (say "n" days) it should then delete the file.

> My thought is that the "TTL" would only be in effect for the purpose
> of keeping BlockReporting working (for however many days or
> weeks you wish the emails to be guaranteed resendable).
> After that time, the TTL is null and the files are game for 
> replacement.  I thought it a simple idea for working around
> the BlockReporting problem Thomas mentioned.

I see, but there's no need to store something along with files,
the regular filesystem timestamp for each file will just work
fine, just remove all files if "(today - filetime) > TTL" 

> On a low-to-medium traffic box, though, this would not be a
>  problem. We already deal with bunches of identical 
> messages from time-to-time (nothing new).

there may be a solution for that too, assuming the spam and
notspam folders gets cleaned up using the TTL, the files may
be saved using (e.g.) an MD5 hash (or the like) as the name
so that identical messages won't be stored more than one
time; by the way that may have some side effects and may
need some more thinking but...

>> Bottom line; the bayes filter should work by /learning/ this 
>> means that it should NOT discard the previous data, but 
>> rather REFINE them from further data coming in; so maybe the 
>> whole bayes approach used inside ASSP should be revised NOT 
>> to deal just with the latest data but to learn/improve during time

> Just an idea, but how do you "NOT" discard data while keeping 
> rebuild times low and maintaining free hard drive space 
> (realistically)? 

Using some kind of "digest" of the previous bases stored in a
more compact format



------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register 
now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to