Le 16/08/2012 21:49, Stevan Bajić a écrit :
Hello Christophe,

Hello Stevan, thanks for answering, I really appreciate.

Spam AND Ham? Really?
Oh yes. And now I did the same for another user:

   tmp # dspam_train t...@garault.org spam ham

   dspam # dspam_stats
   t...@garault.org  TP:     0 TN:   939 FP:     0 FN:   441 SC:     0
   NC:     0


    dspam # dspam_stats
    christo...@garault.org  TP:     0 TN:  5139 FP:     0 FN:  4868
    SC:     0 NC:     0


So you have here 5'139 messages that got classified as HAM ([T]rue [N]egative) and you got 4'868 messages that got falsely classified as HAM ([F]alse [N]egative). Somehow this is very, very, very, very strange. How can you make DSPAM to have just TN and FN count after almost processing 10K messages and no singe TP, FN?

Can I make a guess? You are using sbph as Tokenizer.
Nice try but it's osb. ;)

Something is fishy on your setup. Can you please post your dspam.conf?
Yeah sure, here it is:

dspam # egrep -v "^#.*|^$" /etc/dspam/dspam.conf
Home /var/spool/dspam
StorageDriver /usr/lib/x86_64-linux-gnu/dspam/libpgsql_drv.so
TrustedDeliveryAgent "/usr/bin/procmail"       # Linux
UntrustedDeliveryAgent "/usr/bin/procmail -d %u"
DeliveryHost            127.0.0.1
DeliveryPort            10034
DeliveryIdent           localhost
DeliveryProto           SMTP
EnablePlusedDetail      on
OnFail error
Trust root
Trust dspam
Trusr postfix
Trust www-data
Trust mail
Trust daemon
Trust amavis
TrainingMode teft
TestConditionalTraining on
Feature noise
Feature whitelist
Feature tb=5
Algorithm graham burton
Tokenizer osb
PValue bcr
WebStats on
ImprobabilityDrive on
Preference "trainingMode=TEFT" # { TOE | TUM | TEFT | NOTRAIN } -> default:teft Preference "spamAction=tag" # { quarantine | tag | deliver } -> default:quarantine
Preference "spamSubject=[SPAM]"         # { string } -> default:[SPAM]
Preference "statisticalSedation=5"      # { 0 - 10 } -> default:0
Preference "enableBNR=on"               # { on | off } -> default:off
Preference "enableWhitelist=on"         # { on | off } -> default:on
Preference "signatureLocation=header" # { message | headers } -> default:message
Preference "tagSpam=on"                 # { on | off }
Preference "tagNonspam=off"             # { on | off }
Preference "showFactors=on"             # { on | off } -> default:off
Preference "optIn=off"                  # { on | off }
Preference "optOut=on"                  # { on | off }
Preference "whitelistThreshold=20"      # { Integer } -> default:10
Preference "makeCorpus=off"             # { on | off } -> default:off
Preference "storeFragments=off"         # { on | off } -> default:off
Preference "localStore="                # { on | off } -> default:username
Preference "processorBias=on"           # { on | off } -> default:on
Preference "fallbackDomain=off"         # { on | off } -> default:off
Preference "trainPristine=off"          # { on | off } -> default:off
Preference "optOutClamAV=off"           # { on | off } -> default:off
Preference "ignoreRBLLookups=off"       # { on | off } -> default:off
Preference "RBLInoculate=off"           # { on | off } -> default:off
Preference "notifications=off"          # { on | off } -> default:off
AllowOverride enableBNR
AllowOverride enableWhitelist
AllowOverride fallbackDomain
AllowOverride ignoreGroups
AllowOverride ignoreRBLLookups
AllowOverride localStore
AllowOverride makeCorpus
AllowOverride optIn
AllowOverride optOut
AllowOverride optOutClamAV
AllowOverride processorBias
AllowOverride RBLInoculate
AllowOverride showFactors
AllowOverride signatureLocation
AllowOverride spamAction
AllowOverride spamSubject
AllowOverride statisticalSedation
AllowOverride storeFragments
AllowOverride tagNonspam
AllowOverride tagSpam
AllowOverride trainPristine
AllowOverride trainingMode
AllowOverride whitelistThreshold
AllowOverride dailyQuarantineSummary
AllowOverride notifications
IgnoreHeader Accept-Language
IgnoreHeader Authentication-Results
IgnoreHeader Content-Type
IgnoreHeader DKIM-Signature
IgnoreHeader Date
IgnoreHeader DomainKey-Signature
IgnoreHeader Importance
IgnoreHeader In-Reply-To
IgnoreHeader List-Archive
IgnoreHeader List-Help
IgnoreHeader List-Id
IgnoreHeader List-Post
IgnoreHeader List-Subscribe
IgnoreHeader List-Unsubscribe
IgnoreHeader Message-ID
IgnoreHeader Message-Id
IgnoreHeader Organization
IgnoreHeader Received
IgnoreHeader Received-SPF
IgnoreHeader References
IgnoreHeader Reply-To
IgnoreHeader Resent-Date
IgnoreHeader Resent-From
IgnoreHeader Thread-Index
IgnoreHeader Thread-Topic
IgnoreHeader User-Agent
IgnoreHeader X-policyd-weight
IgnoreHeader thread-index
PurgeSignature  off     # Specified in purge.sql
PurgeNeutral    90
PurgeUnused     off     # Specified in purge.sql
PurgeHapaxes    off     # Specified in purge.sql
PurgeHits1S     off     # Specified in purge.sql
PurgeHits1I     off     # Specified in purge.sql
LocalMX 127.0.0.1
SystemLog       on
UserLog         on
Opt in
ParseToHeaders on
ChangeModeOnParse on
ChangeUserOnParse full
MaxMessageSize 26214400
ServerHost              127.0.0.1
ServerPort              10033
ServerQueueSize         32
ServerPID               /var/run/dspam/dspam.pid
ServerMode auto
ServerParameters        "--deliver=innocent -d %u"
ServerIdent             "dspam.garault"
ProcessorURLContext on
ProcessorBias on
StripRcptDomain off
Include /etc/dspam/dspam.d/



I have now more than 4 million lines in dspam_token_data for this user (me).

This is a lot. Just for slightly 10K messages?
Strange thing is I don't seem to have 10K messages despite the fact they were given to spam_train:

spam=# select count(*) from dspam_signature_data;
 count
-------
  5860


dspam=# select count(*) from dspam_token_data;
  count
---------
 4594613



What version of DSPAM is that?
dspam # dspam --version

DSPAM Anti-Spam Suite 3.10.1 (agent/library)

Copyright (C) 2002-2011 DSPAM Project
http://dspam.sourceforge.net.

DSPAM may be copied only under the terms of the GNU Affero General Public
License, a copy of which can be found with the DSPAM distribution kit.

Configuration parameters: '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--build=x86_64-linux-gnu' '--host=x86_64-linux-gnu' '--sysconfdir=/etc/dspam' '--disable-dependency-tracking' '--enable-split-configuration' '--enable-static' '--enable-external-lookup' '--enable-syslog' '--with-logdir=/var/log/dspam/' '--with-dspam-home=/var/spool/dspam' '--enable-domain-scale' '--with-delivery-agent=/usr/bin/procmail' '--enable-daemon' '--with-mysql-includes=/usr/include/mysql' '--with-pgsql-includes=/usr/include/postgresql' '--with-storage-driver=hash_drv,mysql_drv,pgsql_drv,sqlite3_drv' '--enable-debug' '--enable-virtual-users' '--enable-preferences-extension' '--enable-clamav' 'build_alias=x86_64-linux-gnu' 'host_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-z,relro -Wl,-z,defs -Wl,--as-needed' 'CPPFLAGS=-D_FORTIFY_SOURCE=2'

And again thanks for your help Stevan.

--
"L'ennui avec les citations sur Internet c'est qu'il est difficile de savoir si 
elles sont authentiques." -- Napoléon Bonaparte.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

!DSPAM:502dfc7e214741112915171!
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to