Em 2015-03-24 17:04, k...@rice.edu escreveu:
> On Tue, Mar 24, 2015 at 03:57:01PM -0300, j...@7lan.net wrote:
>> Hi,
>> 
>> I'm using DSPAM in a 100.000 users e-email structure. I run 6 mail
>> server with dspam with the hash driver. The database is kept in a NFS
>> share and it seems to work fine.
>> 
>> I'm using TOE training mode, since I have amavis-new in this structure
>> doing black/whitelist and common blocks. My users can teach ham and 
>> spam
>> messages to dspam automatically.
>> 
>> My questions:
>> 
>> Is TOE the training mode that less uses disk space?
>> What are the hash driver config that I should use? my database is 
>> +100GB
>> right now and growing fast.
>> What is the best practice for database maintenance?
>> 
>> this are my settings:
>> 
>> HashRecMax              98317
>> HashAutoExtend          on
>> HashMaxExtents          0
>> HashExtentSize          49157
>> HashPctIncrease         10
>> HashMaxSeek             10
>> HashConnectionCache     10
>> 
>> 
>> PurgeSignatures 14          # Stale signatures
>> PurgeNeutral    90          # Tokens with neutralish probabilities
>> PurgeUnused     90          # Unused tokens
>> PurgeHapaxes    30          # Tokens with less than 5 hits (hapaxes)
>> PurgeHits1S     15          # Tokens with only 1 spam hit
>> PurgeHits1I     15          # Tokens with only 1 innocent hit
>> 
>> 
>> I disabled the dspam_clean and dspam_logrotate from the dspam servers,
>> and execute them in the fileserver directly.
>> 
>> I tryed to use postgresql driver, but it used a lot of resources.
>> 
>> Can you guys give me some suggestions? The database is getting bigger
>> and I don't know if I'm doing the best maintenance routine.
>> 
>> Thanks!
> 
> Hi,
> 
> I would be leery of using the hash backend for a system with that many
> users using individual training. You are only using ~1MB/user. What 
> tokenizer
> are you using? I would expect you to need much more room per user as 
> the
> training progresses, 10-100MB each. I think your disk usage is going to
> continue to increase to the point that use a PostgreSQL backend would 
> make
> sense. How are you planning to address when a hash file becomes 
> corrupt?
> 
> Regards,
> Ken
> 
I'm using osb tokenizer. The database is "new" that's why it is so small 
today.

I'm planning to put it in a sql backend. What database does dspam works 
better? I saw some postgresql schema optimization, but maybe mysql is 
less resource eater? What are your experiences?

Thanks!



------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to