Em 2015-03-24 17:04, k...@rice.edu escreveu: > On Tue, Mar 24, 2015 at 03:57:01PM -0300, j...@7lan.net wrote: >> Hi, >> >> I'm using DSPAM in a 100.000 users e-email structure. I run 6 mail >> server with dspam with the hash driver. The database is kept in a NFS >> share and it seems to work fine. >> >> I'm using TOE training mode, since I have amavis-new in this structure >> doing black/whitelist and common blocks. My users can teach ham and >> spam >> messages to dspam automatically. >> >> My questions: >> >> Is TOE the training mode that less uses disk space? >> What are the hash driver config that I should use? my database is >> +100GB >> right now and growing fast. >> What is the best practice for database maintenance? >> >> this are my settings: >> >> HashRecMax 98317 >> HashAutoExtend on >> HashMaxExtents 0 >> HashExtentSize 49157 >> HashPctIncrease 10 >> HashMaxSeek 10 >> HashConnectionCache 10 >> >> >> PurgeSignatures 14 # Stale signatures >> PurgeNeutral 90 # Tokens with neutralish probabilities >> PurgeUnused 90 # Unused tokens >> PurgeHapaxes 30 # Tokens with less than 5 hits (hapaxes) >> PurgeHits1S 15 # Tokens with only 1 spam hit >> PurgeHits1I 15 # Tokens with only 1 innocent hit >> >> >> I disabled the dspam_clean and dspam_logrotate from the dspam servers, >> and execute them in the fileserver directly. >> >> I tryed to use postgresql driver, but it used a lot of resources. >> >> Can you guys give me some suggestions? The database is getting bigger >> and I don't know if I'm doing the best maintenance routine. >> >> Thanks! > > Hi, > > I would be leery of using the hash backend for a system with that many > users using individual training. You are only using ~1MB/user. What > tokenizer > are you using? I would expect you to need much more room per user as > the > training progresses, 10-100MB each. I think your disk usage is going to > continue to increase to the point that use a PostgreSQL backend would > make > sense. How are you planning to address when a hash file becomes > corrupt? > > Regards, > Ken > I'm using osb tokenizer. The database is "new" that's why it is so small today.
I'm planning to put it in a sql backend. What database does dspam works better? I saw some postgresql schema optimization, but maybe mysql is less resource eater? What are your experiences? Thanks! ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user