Re: [Dspam-user] dspam scalability

Stevan Bajić Wed, 09 May 2012 03:07:17 -0700

Hello Radim,


On 09.05.2012 10:50, Radim Kolar wrote:
> This is one side of the story. Show us your dspam.conf
>
> ponto# cat dspam.conf
> Home /var/db/dspam
> StorageDriver /usr/local/lib/dspam/libpgsql_drv.so
> TrustedDeliveryAgent "/usr/local/bin/maildrop"
> FallbackDomains off
> OnFail error
> Trust root
> Trust dspam
> Trust apache
> Trust mail
> Trust mailnull
> Trust smmsp
> Trust daemon
> Trust admin
> Trust dovecot
> TrainingMode tum
this is the problem. It is not really a problem but in your case it is. 
You complain about to much data for just 4 users. TUM works exactly like 
TEFT until training mode is down to zero. So TUM produces as much 
tokens/data as TEFT in the beginning.

> TestConditionalTraining on
> Feature noise
> Feature whitelist
> Algorithm graham burton
> Tokenizer osb
> PValue bcr
> WebStats on
> ImprobabilityDrive on
> Preference "trainingMode=TUM"           # { TOE | TUM | TEFT | NOTRAIN }
> ->  default:teft
Same here.

> Preference "spamAction=tag"             # { quarantine | tag | deliver }
> ->  default:quarantine
> Preference "spamSubject=[SPAM]"         # { string } ->  default:[SPAM]
> Preference "statisticalSedation=5"      # { 0 - 10 } ->  default:0
> Preference "enableBNR=on"               # { on | off } ->  default:off
> Preference "enableWhitelist=on"         # { on | off } ->  default:on
> Preference "signatureLocation=message"  # { message | headers } ->
> default:message
> Preference "tagSpam=off"                # { on | off }
> Preference "tagNonspam=off"             # { on | off }
> Preference "showFactors=off"            # { on | off } ->  default:off
> Preference "optIn=off"                  # { on | off }
> Preference "optOut=off"                 # { on | off }
> Preference "whitelistThreshold=20"      # { Integer } ->  default:10
> Preference "makeCorpus=off"             # { on | off } ->  default:off
> Preference "storeFragments=off"         # { on | off } ->  default:off
> Preference "localStore="                # { on | off } ->  default:username
> Preference "processorBias=on"           # { on | off } ->  default:on
> Preference "fallbackDomain=off"         # { on | off } ->  default:off
> Preference "trainPristine=off"          # { on | off } ->  default:off
> Preference "optOutClamAV=off"           # { on | off } ->  default:off
> Preference "ignoreRBLLookups=off"       # { on | off } ->  default:off
> Preference "RBLInoculate=off"           # { on | off } ->  default:off
> Preference "notifications=off"          # { on | off } ->  default:off
> AllowOverride enableBNR
> AllowOverride enableWhitelist
> AllowOverride fallbackDomain
> AllowOverride ignoreGroups
> AllowOverride ignoreRBLLookups
> AllowOverride localStore
> AllowOverride makeCorpus
> AllowOverride optIn
> AllowOverride optOut
> AllowOverride optOutClamAV
> AllowOverride processorBias
> AllowOverride RBLInoculate
> AllowOverride showFactors
> AllowOverride signatureLocation
> AllowOverride spamAction
> AllowOverride spamSubject
> AllowOverride statisticalSedation
> AllowOverride storeFragments
> AllowOverride tagNonspam
> AllowOverride tagSpam
> AllowOverride trainPristine
> AllowOverride trainingMode
> AllowOverride whitelistThreshold
> AllowOverride dailyQuarantineSummary
> AllowOverride notifications
> PgSQLServer             /tmp/
> PgSQLUser               dspam
> PgSQLDb         dspam
> PgSQLConnectionCache    2
> PgSQLUIDInSignature     on
> Notifications   on
> PurgeSignatures 30      # Stale signatures
> PurgeNeutral    90      # Tokens with neutralish probabilities
> PurgeUnused     90      # Unused tokens
> PurgeHapaxes    60      # Tokens with less than 5 hits (hapaxes)
> PurgeHits1S     30      # Tokens with only 1 spam hit
> PurgeHits1I     30      # Tokens with only 1 innocent hit
> LocalMX 127.0.0.1 64.6.108.239
> SystemLog       on
> UserLog         on
> Opt out
> ClamAVPort              3310
> ClamAVHost              127.0.0.1
> ClamAVResponse          reject
> ServerHost              127.0.0.1
> ServerPort              24
> ServerQueueSize 32
> # keep this is sync with /usr/local/etc/rc.d/dspam rc script
> ServerPID               /var/run/dspam.pid
> ServerMode dspam
> ServerDomainSocketPath  "/var/run/dspam.sock"
> ClientHost      "/var/run/dspam.sock"
> ClientIdent     "secret@Relay1"
> ProcessorURLContext on
> ProcessorBias on
> StripRcptDomain off
>
> dspam --version
>
> DSPAM Anti-Spam Suite 3.10.1 (agent/library)
>
> Copyright (C) 2002-2011 DSPAM Project
> http://dspam.sourceforge.net.
>
> DSPAM may be copied only under the terms of the GNU Affero General Public
> License, a copy of which can be found with the DSPAM distribution kit.
>
> Configuration parameters:  '--sysconfdir=/usr/local/etc'
> '--with-logdir=/var/log/dspam' '--with-dspam-home=/var/db/dspam'
> '--with-dspam-home-owner=root' '--with-dspam-home-group=mail'
> '--with-dspam-home-mode=0770' '--with-dspam-owner=root'
> '--with-dspam-group=mail' '--enable-syslog' '--enable-debug'
> '--enable-daemon' '--enable-clamav'
> '--with-pgsql-includes=/usr/local/include'
> '--with-pgsql-libraries=/usr/local/lib'
> '--with-storage-driver=pgsql_drv'
> '--with-delivery-agent=/usr/local/bin/maildrop' '--with-dspam-mode=4511'
> '--enable-logging' '--enable-user-logging' '--prefix=/usr/local'
> '--mandir=/usr/local/man' '--infodir=/usr/local/info/'
> '--build=amd64-portbld-freebsd8.2'
> 'build_alias=amd64-portbld-freebsd8.2' 'CC=cc' 'CFLAGS=-pipe -g'
> 'LDFLAGS= -L/usr/local/lib' 'CPPFLAGS=-I/usr/local/include' 'CPP=cpp'
>
> , output of dspam_admin list preference default
> ponto:(admin)~>sudo dspam_admin list preference default
> ponto:(admin)~>sudo dspam_admin aggregate preference default
> trainingMode=TUM
Here as well.

> spamAction=tag
> spamSubject=[SPAM]
> statisticalSedation=5
> enableBNR=on
> enableWhitelist=on
> signatureLocation=message
> tagSpam=off
> tagNonspam=off
> showFactors=off
> optIn=off
> optOut=off
> whitelistThreshold=20
> makeCorpus=off
> storeFragments=off
> localStore=
> processorBias=on
> fallbackDomain=off
> trainPristine=off
> optOutClamAV=off
> ignoreRBLLookups=off
> RBLInoculate=off
> notifications=off
>
>   >  content of your DSPAM group file
> i know nothing about this file
You should! Creating a MERGED group that you use globally (aka global 
merged group) will allow you to create one big group and share that data 
with all the users. Allow me to explain (just an artificial example to 
illustrate the benefit for you):
global merged group: 1'000'000 spam messages trained  / 1'000'000 
innocent messages trained -> 500MB data
user 1: using all data from 'global merged group' + his own data: maybe 
100 spam messages / 100 innocent messages -> 10 MB data
user 2: using all data from 'global merged group' + his own data: maybe 
10 spam messages / 10 innocent messages -> 1 MB data
user 3: using all data from 'global merged group' + his own data: maybe 
100 spam messages / 100 innocent messages -> 10 MB data
user 4: using all data from 'global merged group' + his own data: maybe 
10 spam messages / 10 innocent messages -> 1 MB data
user n: using all data from 'global merged group' + his own data: maybe 
100 spam messages / 100 innocent messages -> 10 MB data

So the more users you have the less data per user is used since they all 
share the data from the globally merged user. For your setup I would 
strongly suggest to create that globally merged group and train it with 
dspam_train. Search the mailing list for details how to do that. In the 
past two to three weeks I have explained one approach of how to do it.


btw: your dspam.config looks okay. I have nothing to complain. Maybe you 
should consider to add a training buffer (Feature tb=n) if you want to 
lower fp/fn during training?

> Let me guess: You run TEFT, use word or chain or even sbph as tokenizer,
> you are not using groups and you never run dspam_clean. Right?
>   >However.... could it be that you are from Asia?
> nope
Okay. Sorry. My bad.


> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Dspam-user mailing list
> Dspam-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspam-user
>


-- 
Kind Regards from Switzerland,

Stevan Bajić


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] dspam scalability

Reply via email to