>> On Sunday 14 February 2010 12:25:40 Stevan Bajić wrote:
>>> On Sun, 14 Feb 2010 11:49:20 +0000
>>>
>>> Kārlis Repsons <[email protected]> wrote:
>>> > I know it depends on quite many factors in total, but anyway, could
>>> we
>>> > make a small list of values and info in here like this:
>>>
>>> what do you mean? We all here should submit our values?
>> Presuming, that my variables list was sufficiently complete +
>> significant
>> to
>> understand what total diskspace dspam can take up in what case -- yes!
>> Otherwise correct it...
>>
Okay. In order to compute the size you need for the database you need to
have the following numbers/figures:

* Strategy of purging. How many days do you want to keep the data for
users to allow them to retrain? (using SQL purging this would be 14 days)

* Amount of INBOUND mail in bytes you get in the range of purge days.
(using SQL purging this would be 14 days).

* Count of INBOUND mails you get during the purge day range. (like above:
14 days is the default).

* Tokenizer used in DSPAM.



An example:
* Purging daily keeping 14 days of signatures
* Amount of INBOUND mails in 14 days: 14'680'064 bytes
* Used tokenizer: OSB

Now assume that the average word length is just 5 characters then those
14'680'064 bytes would result in +/- 2'446'678 words (this is 5 bytes for
a word + one character for a word boundary = 6 bytes). Now assume that
those 2.5 million words or words order would all be unique. Then this
would result in: ( 2'446'678 - 5 ) * 4 = 9'786'692 tokens for OSB

Now depending on what database schema you have, you could compute the
total amount needed for the table "dspam_token_data" to hold those +/- 10
million tokens.

The size needed for "dspam_signature_data" will be not more then the
amount of INBOUND. Aka: 14GB

This should give you a base number for your setup. And like every good
system admin you should plan for the future. You sure have somewhere
laying around statistical data about the grow you had in the past
regarding INBOUND mail. Just use those numbers and compute what you expect
for the near future and use those numbers to compute the needed storage
for DSPAM.

And to be on the sure side I would suggest you to multiply that number by
1.5 or 2.0 so that you have room for unexpected grow.

That's how I would do that computation. Asking other here about how much
space they use is not going to bring you big benefits. Every setup is
different.

The numbers I mentioned above are way, way, way to big. You usually don't
have 100% new tokens for each and every message. But it's never bad to
compute the worst possible scenario and use that as you absolute highest
number then computing everything with to optimal values and then later
realize that you need to upgrade your hardware.


------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to