>> On Sunday 14 February 2010 12:25:40 Stevan BajiÄ wrote: >>> On Sun, 14 Feb 2010 11:49:20 +0000 >>> >>> KÄrlis Repsons <[email protected]> wrote: >>> > I know it depends on quite many factors in total, but anyway, could >>> we >>> > make a small list of values and info in here like this: >>> >>> what do you mean? We all here should submit our values? >> Presuming, that my variables list was sufficiently complete + >> significant >> to >> understand what total diskspace dspam can take up in what case -- yes! >> Otherwise correct it... >> Okay. In order to compute the size you need for the database you need to have the following numbers/figures:
* Strategy of purging. How many days do you want to keep the data for users to allow them to retrain? (using SQL purging this would be 14 days) * Amount of INBOUND mail in bytes you get in the range of purge days. (using SQL purging this would be 14 days). * Count of INBOUND mails you get during the purge day range. (like above: 14 days is the default). * Tokenizer used in DSPAM. An example: * Purging daily keeping 14 days of signatures * Amount of INBOUND mails in 14 days: 14'680'064 bytes * Used tokenizer: OSB Now assume that the average word length is just 5 characters then those 14'680'064 bytes would result in +/- 2'446'678 words (this is 5 bytes for a word + one character for a word boundary = 6 bytes). Now assume that those 2.5 million words or words order would all be unique. Then this would result in: ( 2'446'678 - 5 ) * 4 = 9'786'692 tokens for OSB Now depending on what database schema you have, you could compute the total amount needed for the table "dspam_token_data" to hold those +/- 10 million tokens. The size needed for "dspam_signature_data" will be not more then the amount of INBOUND. Aka: 14GB This should give you a base number for your setup. And like every good system admin you should plan for the future. You sure have somewhere laying around statistical data about the grow you had in the past regarding INBOUND mail. Just use those numbers and compute what you expect for the near future and use those numbers to compute the needed storage for DSPAM. And to be on the sure side I would suggest you to multiply that number by 1.5 or 2.0 so that you have room for unexpected grow. That's how I would do that computation. Asking other here about how much space they use is not going to bring you big benefits. Every setup is different. The numbers I mentioned above are way, way, way to big. You usually don't have 100% new tokens for each and every message. But it's never bad to compute the worst possible scenario and use that as you absolute highest number then computing everything with to optimal values and then later realize that you need to upgrade your hardware. ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
