Hello Radim,
On 09.05.2012 10:50, Radim Kolar wrote: > This is one side of the story. Show us your dspam.conf > > ponto# cat dspam.conf > Home /var/db/dspam > StorageDriver /usr/local/lib/dspam/libpgsql_drv.so > TrustedDeliveryAgent "/usr/local/bin/maildrop" > FallbackDomains off > OnFail error > Trust root > Trust dspam > Trust apache > Trust mail > Trust mailnull > Trust smmsp > Trust daemon > Trust admin > Trust dovecot > TrainingMode tum this is the problem. It is not really a problem but in your case it is. You complain about to much data for just 4 users. TUM works exactly like TEFT until training mode is down to zero. So TUM produces as much tokens/data as TEFT in the beginning. > TestConditionalTraining on > Feature noise > Feature whitelist > Algorithm graham burton > Tokenizer osb > PValue bcr > WebStats on > ImprobabilityDrive on > Preference "trainingMode=TUM" # { TOE | TUM | TEFT | NOTRAIN } > -> default:teft Same here. > Preference "spamAction=tag" # { quarantine | tag | deliver } > -> default:quarantine > Preference "spamSubject=[SPAM]" # { string } -> default:[SPAM] > Preference "statisticalSedation=5" # { 0 - 10 } -> default:0 > Preference "enableBNR=on" # { on | off } -> default:off > Preference "enableWhitelist=on" # { on | off } -> default:on > Preference "signatureLocation=message" # { message | headers } -> > default:message > Preference "tagSpam=off" # { on | off } > Preference "tagNonspam=off" # { on | off } > Preference "showFactors=off" # { on | off } -> default:off > Preference "optIn=off" # { on | off } > Preference "optOut=off" # { on | off } > Preference "whitelistThreshold=20" # { Integer } -> default:10 > Preference "makeCorpus=off" # { on | off } -> default:off > Preference "storeFragments=off" # { on | off } -> default:off > Preference "localStore=" # { on | off } -> default:username > Preference "processorBias=on" # { on | off } -> default:on > Preference "fallbackDomain=off" # { on | off } -> default:off > Preference "trainPristine=off" # { on | off } -> default:off > Preference "optOutClamAV=off" # { on | off } -> default:off > Preference "ignoreRBLLookups=off" # { on | off } -> default:off > Preference "RBLInoculate=off" # { on | off } -> default:off > Preference "notifications=off" # { on | off } -> default:off > AllowOverride enableBNR > AllowOverride enableWhitelist > AllowOverride fallbackDomain > AllowOverride ignoreGroups > AllowOverride ignoreRBLLookups > AllowOverride localStore > AllowOverride makeCorpus > AllowOverride optIn > AllowOverride optOut > AllowOverride optOutClamAV > AllowOverride processorBias > AllowOverride RBLInoculate > AllowOverride showFactors > AllowOverride signatureLocation > AllowOverride spamAction > AllowOverride spamSubject > AllowOverride statisticalSedation > AllowOverride storeFragments > AllowOverride tagNonspam > AllowOverride tagSpam > AllowOverride trainPristine > AllowOverride trainingMode > AllowOverride whitelistThreshold > AllowOverride dailyQuarantineSummary > AllowOverride notifications > PgSQLServer /tmp/ > PgSQLUser dspam > PgSQLDb dspam > PgSQLConnectionCache 2 > PgSQLUIDInSignature on > Notifications on > PurgeSignatures 30 # Stale signatures > PurgeNeutral 90 # Tokens with neutralish probabilities > PurgeUnused 90 # Unused tokens > PurgeHapaxes 60 # Tokens with less than 5 hits (hapaxes) > PurgeHits1S 30 # Tokens with only 1 spam hit > PurgeHits1I 30 # Tokens with only 1 innocent hit > LocalMX 127.0.0.1 64.6.108.239 > SystemLog on > UserLog on > Opt out > ClamAVPort 3310 > ClamAVHost 127.0.0.1 > ClamAVResponse reject > ServerHost 127.0.0.1 > ServerPort 24 > ServerQueueSize 32 > # keep this is sync with /usr/local/etc/rc.d/dspam rc script > ServerPID /var/run/dspam.pid > ServerMode dspam > ServerDomainSocketPath "/var/run/dspam.sock" > ClientHost "/var/run/dspam.sock" > ClientIdent "secret@Relay1" > ProcessorURLContext on > ProcessorBias on > StripRcptDomain off > > dspam --version > > DSPAM Anti-Spam Suite 3.10.1 (agent/library) > > Copyright (C) 2002-2011 DSPAM Project > http://dspam.sourceforge.net. > > DSPAM may be copied only under the terms of the GNU Affero General Public > License, a copy of which can be found with the DSPAM distribution kit. > > Configuration parameters: '--sysconfdir=/usr/local/etc' > '--with-logdir=/var/log/dspam' '--with-dspam-home=/var/db/dspam' > '--with-dspam-home-owner=root' '--with-dspam-home-group=mail' > '--with-dspam-home-mode=0770' '--with-dspam-owner=root' > '--with-dspam-group=mail' '--enable-syslog' '--enable-debug' > '--enable-daemon' '--enable-clamav' > '--with-pgsql-includes=/usr/local/include' > '--with-pgsql-libraries=/usr/local/lib' > '--with-storage-driver=pgsql_drv' > '--with-delivery-agent=/usr/local/bin/maildrop' '--with-dspam-mode=4511' > '--enable-logging' '--enable-user-logging' '--prefix=/usr/local' > '--mandir=/usr/local/man' '--infodir=/usr/local/info/' > '--build=amd64-portbld-freebsd8.2' > 'build_alias=amd64-portbld-freebsd8.2' 'CC=cc' 'CFLAGS=-pipe -g' > 'LDFLAGS= -L/usr/local/lib' 'CPPFLAGS=-I/usr/local/include' 'CPP=cpp' > > , output of dspam_admin list preference default > ponto:(admin)~>sudo dspam_admin list preference default > ponto:(admin)~>sudo dspam_admin aggregate preference default > trainingMode=TUM Here as well. > spamAction=tag > spamSubject=[SPAM] > statisticalSedation=5 > enableBNR=on > enableWhitelist=on > signatureLocation=message > tagSpam=off > tagNonspam=off > showFactors=off > optIn=off > optOut=off > whitelistThreshold=20 > makeCorpus=off > storeFragments=off > localStore= > processorBias=on > fallbackDomain=off > trainPristine=off > optOutClamAV=off > ignoreRBLLookups=off > RBLInoculate=off > notifications=off > > > content of your DSPAM group file > i know nothing about this file You should! Creating a MERGED group that you use globally (aka global merged group) will allow you to create one big group and share that data with all the users. Allow me to explain (just an artificial example to illustrate the benefit for you): global merged group: 1'000'000 spam messages trained / 1'000'000 innocent messages trained -> 500MB data user 1: using all data from 'global merged group' + his own data: maybe 100 spam messages / 100 innocent messages -> 10 MB data user 2: using all data from 'global merged group' + his own data: maybe 10 spam messages / 10 innocent messages -> 1 MB data user 3: using all data from 'global merged group' + his own data: maybe 100 spam messages / 100 innocent messages -> 10 MB data user 4: using all data from 'global merged group' + his own data: maybe 10 spam messages / 10 innocent messages -> 1 MB data user n: using all data from 'global merged group' + his own data: maybe 100 spam messages / 100 innocent messages -> 10 MB data So the more users you have the less data per user is used since they all share the data from the globally merged user. For your setup I would strongly suggest to create that globally merged group and train it with dspam_train. Search the mailing list for details how to do that. In the past two to three weeks I have explained one approach of how to do it. btw: your dspam.config looks okay. I have nothing to complain. Maybe you should consider to add a training buffer (Feature tb=n) if you want to lower fp/fn during training? > Let me guess: You run TEFT, use word or chain or even sbph as tokenizer, > you are not using groups and you never run dspam_clean. Right? > >However.... could it be that you are from Asia? > nope Okay. Sorry. My bad. > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Dspam-user mailing list > Dspam-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspam-user > -- Kind Regards from Switzerland, Stevan Bajić ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user