I do not understand the discussion ! There are all wishes build in (assp) except removing mails with the same subject - I do not love this idea, because the subject is ignored by rebuildspamdb - only the body is used and mails with the same body are ignored (except one) and will be deleted 60 days later .
------------------------------------------- ['MaintBayesCollection','Maintenance for Bayesian Collection',0,\&checkbox,'','(.*)',undef, 'Set this to on, if you want ASSP to run a maintenance tasks on the bayesian collection folders ( spamlog , notspamlog , correctedspam , correctednotspam ). ASSP will delete the oldest files until the number of files per folder reaches MaxFiles. If you want ASSP to delete files because of their age instead of the number of files ( MaxFiles ), setup MaxBayesFileAge and/or MaxCorrectedDays to your needs.<br /> This option is usefull, if UseSubjectsAsMaillogNames is set to on and doMove2Num is set to off, because in this case the number of files in every collection folder will grow infinite.',undef,undef,'msg006140','msg006141'], ['MaxBayesFileAge','Max Age of Bayes Files',10,\&textinput,0,'(\d+)',undef, 'The maximum file age in days of every file in every bayesian collection folder ( spamlog , notspamlog ). If MaintBayesCollection is set to on and a file is older than this number in days, the file will be deleted. Default is 0. A value of 0 disables this feature and no file will be deleted because of its age.<br /> <span class = "negative">Do not define this option, if you use the bayesian engine of ASSP. Deleting files because of there age, is wrong in this case!!!!!</span>',undef,undef,'msg006150','msg006151'], ['MaxCorrectedDays','Max Corrected File Age',5,\&textinput,'1000','(\d+)',undef,'This is the number of days a error report will be kept in the correctednotspam and correctedspam folders. These folders are the longterm memory of ASSP, therefore the default is 1000 days. ',undef,undef,'msg008590','msg008591'], ['MaxNoBayesFileAge','Max Age of non Bayes Files',10,\&textinput,0,'(\d+)',undef, 'The maximum file age in days of every file in every non bayesian collection folder ( incomingOkMail , discarded , viruslog ). If defined and a file is older than this number in days, the file will be deleted. Default is 0. A value of 0 disables this feature and no file will be deleted because of its age.',undef,undef,'msg006160','msg006161'], --------------------------------------------- If MaintBayesCollection is set to on -it is your choice to set the rest to your needs. - MaxBayesFileAge/MaxNoBayesFileAge == 0 - reduce the number of files to maxfiles by deleting the oldest - MaxBayesFileAge/MaxNoBayesFileAge != 0 - reduce the number of files by deleting all that are older than XX -MaxCorrectedDays - this files should never be deleted (use 1000000) And keep in mind - if the number of files per folder is reduced to maxfiles at 1:00 AM and rebuildspamdb is running at 11:00 PM - rebuildspamdb has to process possibly much more than maxfiles! Currently there is a mistake in this maint-task: the files with the filedate set to 60 days in future, are the last files that will be deleted - this will be fixed in 4.14 Thomas "GrayHat" <gray...@gmx.net> 15.09.2009 18:35 Bitte antworten an GrayHat <gray...@gmx.net>; Bitte antworten an ASSP development mailing list <assp-test@lists.sourceforge.net> An "ASSP development mailing list" <assp-test@lists.sourceforge.net> Kopie Thema Re: [Assp-test] Antwort: Re: Antwort: Re: Antwort: Re:fixesandnewsin 2.0.1_RC0.4.12 >> Hmm... that sounds like an idea which was brought on some >> time ago (John was still the dev for ASSP at the time); that >> is, set up some kind of TTL parameter for corpus files so >> that the spamdb rebuild should check the file date/time and >> if over the TTL (say "n" days) it should then delete the file. > My thought is that the "TTL" would only be in effect for the purpose > of keeping BlockReporting working (for however many days or > weeks you wish the emails to be guaranteed resendable). > After that time, the TTL is null and the files are game for > replacement. I thought it a simple idea for working around > the BlockReporting problem Thomas mentioned. I see, but there's no need to store something along with files, the regular filesystem timestamp for each file will just work fine, just remove all files if "(today - filetime) > TTL" > On a low-to-medium traffic box, though, this would not be a > problem. We already deal with bunches of identical > messages from time-to-time (nothing new). there may be a solution for that too, assuming the spam and notspam folders gets cleaned up using the TTL, the files may be saved using (e.g.) an MD5 hash (or the like) as the name so that identical messages won't be stored more than one time; by the way that may have some side effects and may need some more thinking but... >> Bottom line; the bayes filter should work by /learning/ this >> means that it should NOT discard the previous data, but >> rather REFINE them from further data coming in; so maybe the >> whole bayes approach used inside ASSP should be revised NOT >> to deal just with the latest data but to learn/improve during time > Just an idea, but how do you "NOT" discard data while keeping > rebuild times low and maintaining free hard drive space > (realistically)? Using some kind of "digest" of the previous bases stored in a more compact format ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! ******************************************************* ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test