Re: [Dspam-user] training time?

Terry Barnum Sat, 10 Apr 2010 11:43:46 -0700

On Apr 10, 2010, at 3:27 AM, Stevan Bajić wrote:

> On Fri, 9 Apr 2010 23:23:16 -0700
> Terry Barnum <[email protected]> wrote:
> 
>> 
>> On Apr 9, 2010, at 7:21 PM, Stevan Bajić wrote:
>> 
>>> On Fri, 9 Apr 2010 19:00:54 -0700
>>> Terry Barnum <[email protected]> wrote:
>>> 
>>>> I've been running DSPAM for approximately 2 weeks and looking at the 
>>>> output of dspam_stats, I'm curious how long training normally takes.
>>>> 
>>>> A script is run nightly to check .Junk mailboxes for false negatives and 
>>>> .NotJunk mailboxes for false positives and retrains on error. (Richard 
>>>> Valk's http://switch.richard5.net/serverinstall/train.dspam)
>>>> 
>>>> Here's sample output from dspam_stats -H
>>>> 
>>>> [email protected]:
>>>>            TP True Positives:                     0
>>>>            TN True Negatives:                    19
>>>>            FP False Positives:                    0
>>>>            FN False Negatives:                  348
>>>>            SC Spam Corpusfed:                     0
>>>>            NC Nonspam Corpusfed:                  0
>>>>            TL Training Left:                   2481
>>>>            SHR Spam Hit Rate                  0.00%
>>>>            HSR Ham Strike Rate:               0.00%
>>>>            PPV Positive predictive value:   100.00%
>>>>            OCA Overall Accuracy:              5.18%
>>>> 
>>>> [email protected]:
>>>>            TP True Positives:                     0
>>>>            TN True Negatives:                     0
>>>>            FP False Positives:                    0
>>>>            FN False Negatives:                 3035
>>>>            SC Spam Corpusfed:                     0
>>>>            NC Nonspam Corpusfed:                  0
>>>>            TL Training Left:                   2500
>>>>            SHR Spam Hit Rate                  0.00%
>>>>            HSR Ham Strike Rate:             100.00%
>>>>            PPV Positive predictive value:   100.00%
>>>>            OCA Overall Accuracy:              0.00%
>>>> 
>>>> [email protected]:
>>>>            TP True Positives:                     0
>>>>            TN True Negatives:                     0
>>>>            FP False Positives:                    0
>>>>            FN False Negatives:                  358
>>>>            SC Spam Corpusfed:                     0
>>>>            NC Nonspam Corpusfed:                  0
>>>>            TL Training Left:                   2500
>>>>            SHR Spam Hit Rate                  0.00%
>>>>            HSR Ham Strike Rate:             100.00%
>>>>            PPV Positive predictive value:   100.00%
>>>>            OCA Overall Accuracy:              0.00%
>>>> 
>>>> [email protected]:
>>>>            TP True Positives:                     0
>>>>            TN True Negatives:                     3
>>>>            FP False Positives:                    0
>>>>            FN False Negatives:                 5108
>>>>            SC Spam Corpusfed:                     0
>>>>            NC Nonspam Corpusfed:                  0
>>>>            TL Training Left:                   2497
>>>>            SHR Spam Hit Rate                  0.00%
>>>>            HSR Ham Strike Rate:               0.00%
>>>>            PPV Positive predictive value:   100.00%
>>>>            OCA Overall Accuracy:              0.09%
>>>> 
>>> This all looks to me that you are not using DSPAM at all. Seems to me that 
>>> only the script from http://switch.richard5.net/serverinstall/train.dspam 
>>> is feeding DSPAM with data in your setup.
>> 
>> Thank you for your help Stevan. My understanding of how this is supposed to 
>> eventually work is DSPAM analyzes and adds a header to email as Innocent or 
>> Spam and the MUA, which is configured to trust the Spam header, moves mail 
>> into the Junk mailbox if DSPAM classified it as Spam. The MUA has its own 
>> Junk filtering and moves mail it considers spam into the Junk mailbox too. 
>> So the nightly script may run across mail in the Junk mailbox that it 
>> mis-classified as Innocent but is actually spam and is retrained as a false 
>> negative. Conversely, if DSPAM incorrectly classifies mail as spam, the user 
>> moves that email from the Junk mailbox into the NotJunk mailbox so the 
>> nightly script can retrain as a false positive.
>> 
> So what it does is basically what the Dovecot anti-spam plugin does. The 
> plugin however does it in real time while the script you have there does it 
> on a scheduled basis.
> 
> 
>> DSPAM appears to be correctly adding headers but so far I've seen only 
>> Whitelisted and Innocent.
>> 
> But how is it possible that you almost have everywhere 0 for TN/TP. If DSPAM 
> would work properly then TP/TN would need to increase every time you get a 
> mail.


That's what I'm wondering too. Could the train.dspam script somehow trigger a 
reset of those fields?

It's very possible I have a stupid mis-configuration problem and I very much 
appreciate the help. This is my first postfix/dovecot install and I'm learning 
something every day.


>>>> Is so much "Training Left" normal? Do I have something misconfigured? Will 
>>>> DSPAM start tagging email as SPAM only after 2500 successfully classified 
>>>> emails?
>>>> 
>>> No. DSPAM is fully functional from day one. The tagging can be turned 
>>> on/off inside dspam.conf or with the preference extension. However... 
>>> turning on/off the tagging has nothing to do with the training left number.
>>> 
>>> 
>>>> $ dspam --version
>>>> 
>>>> DSPAM Anti-Spam Suite 3.9.0 (agent/library)
>>>> 
>>>> Copyright (c) 2002-2009 DSPAM Project
>>>> http://dspam.sourceforge.net.
>>>> 
>>>> DSPAM may be copied only under the terms of the GNU General Public License,
>>>> a copy of which can be found with the DSPAM distribution kit.
>>>> 
>>>> $ cat /usr/local/dspam.conf | grep -v ^# | grep -v ^$
>>>> 
>>>> Home /usr/local/var/dspam
>>>> StorageDriver /usr/local/lib/dspam/libmysql_drv.dylib
>>>> TrustedDeliveryAgent "/usr/bin/procmail"
>>>> DeliveryHost               127.0.0.1
>>>> DeliveryPort               10026
>>>> DeliveryIdent              localhost
>>>> DeliveryProto              SMTP
>>>> OnFail error
>>>> Trust root
>>>> Trust dspam
>>>> Trust apache
>>>> Trust mail
>>>> Trust mailnull 
>>>> Trust smmsp
>>>> Trust daemon
>>>> Trust _dspam
>>>> Trust _postfix
>>>> Trust _www
>>>> TrainingMode toe
>>>> TestConditionalTraining on
>>>> Feature whitelist
>>>> Algorithm graham burton
>>>> Tokenizer osb
>>>> PValue bcr
>>>> WebStats on
>>>> Preference "trainingMode=TOE"              # { TOE | TUM | TEFT | NOTRAIN 
>>>> } -> default:teft
>>>> Preference "spamAction=tag"                # { quarantine | tag | deliver 
>>>> } -> default:quarantine
>>>> Preference "spamSubject=[SPAM]"            # { string } -> default:[SPAM]
>>>> Preference "statisticalSedation=5" # { 0 - 10 } -> default:0
>>>> Preference "enableBNR=on"          # { on | off } -> default:off
>>>> Preference "enableWhitelist=on"            # { on | off } -> default:on
>>>> Preference "signatureLocation=headers"     # { message | headers } -> 
>>>> default:message
>>>> Preference "tagSpam=off"           # { on | off }
>>>> Preference "tagNonspam=off"                # { on | off }
>>>> Preference "showFactors=on"                # { on | off } -> default:off
>>>> Preference "optIn=off"                     # { on | off }
>>>> Preference "optOut=off"                    # { on | off }
>>>> Preference "whitelistThreshold=10" # { Integer } -> default:10
>>>> Preference "makeCorpus=off"                # { on | off } -> default:off
>>>> Preference "storeFragments=off"            # { on | off } -> default:off
>>>> Preference "localStore="           # { on | off } -> default:username  
>>>> <---- ** okay to be blank? **
>>>> 
>>> Yes
>>> 
>>> 
>>>> Preference "processorBias=on"              # { on | off } -> default:on
>>>> Preference "fallbackDomain=off"            # { on | off } -> default:off
>>>> Preference "trainPristine=off"             # { on | off } -> default:off
>>>> Preference "optOutClamAV=off"              # { on | off } -> default:off
>>>> Preference "ignoreRBLLookups=off"  # { on | off } -> default:off
>>>> Preference "RBLInoculate=off"              # { on | off } -> default:off
>>>> AllowOverride enableBNR
>>>> AllowOverride enableWhitelist
>>>> AllowOverride fallbackDomain
>>>> AllowOverride ignoreGroups
>>>> AllowOverride ignoreRBLLookups
>>>> AllowOverride localStore
>>>> AllowOverride makeCorpus
>>>> AllowOverride optIn
>>>> AllowOverride optOut
>>>> AllowOverride optOutClamAV
>>>> AllowOverride processorBias
>>>> AllowOverride RBLInoculate
>>>> AllowOverride showFactors
>>>> AllowOverride signatureLocation
>>>> AllowOverride spamAction
>>>> AllowOverride spamSubject
>>>> AllowOverride statisticalSedation
>>>> AllowOverride storeFragments
>>>> AllowOverride tagNonspam
>>>> AllowOverride tagSpam
>>>> AllowOverride trainPristine
>>>> AllowOverride trainingMode
>>>> AllowOverride whitelistThreshold
>>>> AllowOverride dailyQuarantineSummary
>>>> MySQLServer                /var/mysql/mysql.sock
>>>> MySQLUser          *
>>>> MySQLPass          *
>>>> MySQLDb                    *
>>>> MySQLCompress              false
>>>> MySQLVirtualTable          dspam_virtual_uids
>>>> MySQLVirtualUIDField               uid
>>>> MySQLVirtualUsernameField  username
>>>> MySQLUIDInSignature        on
>>>> HashRecMax         98317
>>>> HashAutoExtend             on  
>>>> HashMaxExtents             0
>>>> HashExtentSize             49157
>>>> HashPctIncrease            10
>>>> HashMaxSeek                10
>>>> HashConnectionCache        10
>>>> Notifications      off
>>>> PurgeSignatures 14 # Stale signatures
>>>> PurgeNeutral       90      # Tokens with neutralish probabilities
>>>> PurgeUnused        90      # Unused tokens
>>>> PurgeHapaxes       30      # Tokens with less than 5 hits (hapaxes)
>>>> PurgeHits1S        15      # Tokens with only 1 spam hit
>>>> PurgeHits1I        15      # Tokens with only 1 innocent hit
>>>> LocalMX 127.0.0.1
>>>> SystemLog  on
>>>> UserLog            on
>>>> Opt out
>>>> ParseToHeaders on
>>>> ChangeModeOnParse on
>>>> ChangeUserOnParse full
>>>> ServerPID          /var/run/dspam.pid
>>>> ServerParameters   "--deliver=innocent,spam"
>>>> ServerIdent                "localhost.local"
>>>> ProcessorURLContext on
>>>> ProcessorBias on
>>>> StripRcptDomain off
>>>> 
>>> What MTA are you using? Postfix? If so could you post your master.conf and 
>>> your main.conf?
>> 
>> Yes, postfix/dovecot/mysql with virtual users, postgrey, dspam and vacation.
>> 
>> $ postconf -n
>> 
>> broken_sasl_auth_clients = yes
>> command_directory = /opt/local/sbin
>> config_directory = /opt/local/etc/postfix
>> daemon_directory = /opt/local/libexec/postfix
>> data_directory = /opt/local/var/lib/postfix
>> debug_peer_level = 2
>> default_privs = nobody
>> delay_warning_time = 4h
>> home_mailbox = Maildir/
>> html_directory = no
>> mail_owner = _postfix
>> mailq_path = /opt/local/bin/mailq
>> manpage_directory = /opt/local/share/man
>> mydestination = $myhostname, localhost.$mydomain, localhost
>> myhostname = mailbox.dop.com
>> mynetworks = 192.168.0.0/23, 127.0.0.0/8
>> myorigin = $mydomain
>> newaliases_path = /opt/local/bin/newaliases
>> proxy_interfaces = 70.167.15.114
>> queue_directory = /opt/local/var/spool/postfix
>> readme_directory = /opt/local/share/postfix/readme
>> sample_directory = /opt/local/share/postfix/sample
>> sendmail_path = /opt/local/sbin/sendmail
>> setgid_group = _postdrop
>> smtpd_banner = $myhostname ESMTP $mail_name
>> smtpd_helo_required = yes
>> smtpd_helo_restrictions = permit_mynetworks, reject_non_fqdn_helo_hostname
>> smtpd_recipient_restrictions = permit_mynetworks, permit_sasl_authenticated, 
>> reject_non_fqdn_sender, reject_non_fqdn_recipient, 
>> reject_unknown_sender_domain, reject_unknown_recipient_domain, 
>> reject_unauth_pipelining, reject_unauth_destination, 
>> reject_unlisted_recipient, check_helo_access 
>> hash:/opt/local/etc/postfix/helo_checks, check_sender_access 
>> hash:/opt/local/etc/postfix/access_sender, reject_rbl_client 
>> zen.spamhaus.org, reject_rbl_client bl.spamcop.net, check_policy_service 
>> inet:127.0.0.1:60000, check_client_access 
>> pcre:/opt/local/etc/postfix/dspam_filter_access
>> 
> could you post the content of that /opt/local/etc/postfix/dspam_filter_access 
> file?

$ cat dspam_filter_access
/./     FILTER dspam:dspam


>> smtpd_reject_unlisted_sender = yes
>> smtpd_sasl_auth_enable = yes
>> smtpd_sasl_local_domain = $myhostname
>> smtpd_sasl_path = private/auth
>> smtpd_sasl_security_options = noanonymous
>> smtpd_sasl_type = dovecot
>> smtpd_sender_restrictions = permit_mynetworks, reject_unknown_address
>> smtpd_tls_cert_file = /opt/local/etc/postfix/ssl/certs/postfix.cert
>> smtpd_tls_key_file = /opt/local/etc/postfix/ssl/private/postfix.key
>> smtpd_tls_loglevel = 1
>> smtpd_tls_security_level = may
>> tls_random_source = dev:/dev/urandom
>> transport_maps = hash:/opt/local/etc/postfix/transport
>> unknown_local_recipient_reject_code = 550
>> virtual_alias_maps = 
>> proxy:mysql:/opt/local/etc/postfix/mysql_virtual_alias_maps.cf
>> virtual_gid_maps = static:102
>> virtual_mailbox_base = /xxxx/xxxx/xxxx/
>> virtual_mailbox_domains = 
>> mysql:/opt/local/etc/postfix/mysql_virtual_mailbox_domains.cf
>> virtual_mailbox_maps = 
>> proxy:mysql:/opt/local/etc/postfix/mysql_virtual_mailbox_maps.cf
>> virtual_minimum_uid = 102
>> virtual_transport = dovecot
>> virtual_uid_maps = static:102
>> 
>> $ cat master.cf | grep -v ^#
>> 
>> smtp      inet  n       -       n       -       -       smtpd
>> dspam          unix  -       n       n       -       10      pipe
>> flags=Ru user=_dspam argv=/usr/local/bin/dspam --deliver=innocent --user 
>> ${recipient} -i -f $sender -- $recipient
>> submission inet n       -       n       -       -       smtpd
>> -o smtpd_enforce_tls=yes
>> -o smtpd_tls_security_level=encrypt
>> -o smtpd_sasl_auth_enable=yes
>> -o smtpd_client_restrictions=permit_sasl_authenticated,reject
>> -o milter_macro_daemon_name=ORIGINATING
>> pickup    fifo  n       -       n       60      1       pickup
>> cleanup   unix  n       -       n       -       0       cleanup
>> qmgr      fifo  n       -       n       300     1       qmgr
>> tlsmgr    unix  -       -       n       1000?   1       tlsmgr
>> rewrite   unix  -       -       n       -       -       trivial-rewrite
>> bounce    unix  -       -       n       -       0       bounce
>> defer     unix  -       -       n       -       0       bounce
>> trace     unix  -       -       n       -       0       bounce
>> verify    unix  -       -       n       -       1       verify
>> flush     unix  n       -       n       1000?   0       flush
>> proxymap  unix  -       -       n       -       -       proxymap
>> proxywrite unix -       -       n       -       1       proxymap
>> smtp      unix  -       -       n       -       -       smtp
>> relay     unix  -       -       n       -       -       smtp
>>      -o smtp_fallback_relay=
>> showq     unix  n       -       n       -       -       showq
>> error     unix  -       -       n       -       -       error
>> retry     unix  -       -       n       -       -       error
>> discard   unix  -       -       n       -       -       discard
>> local     unix  -       n       n       -       -       local
>> virtual   unix  -       n       n       -       -       virtual
>> lmtp      unix  -       -       n       -       -       lmtp
>> anvil     unix  -       -       n       -       1       anvil
>> scache    unix  -       -       n       -       1       scache
>> dovecot   unix       -       n       n       -       -       pipe
>> flags=DRhu user=_vmail argv=/opt/local/libexec/dovecot/deliver -f ${sender} 
>> -d ${recipient}
>> localhost:10026      inet    n       -       n       -       -       smtpd
>> -o content_filter=
>> -o 
>> receive_override_options=no_unknown_recipient_checks,no_header_body_checks,no_address_mappings
>> -o smtpd_helo_restrictions=
>> -o smtpd_client_restrictions=
>> -o smtpd_sender_restrictions=
>> -o smtpd_recipient_restrictions=permit_mynetworks,reject
>> -o mynetworks=127.0.0.0/8
>> -o smtpd_authorized_xforward_hosts=127.0.0.0/8
>> vacation  unix       -       n       n       -       -       pipe
>> flags=Rq user=_vacation argv=/opt/local/var/spool/vacation/vacation.pl -f 
>> ${sender} -- ${recipient}
>> 
> Hmm... that looks to me like you are using FILTER to pass messages to DSPAM. 
> Right?

Yes. Is this not a good approach?

Also, I'm not sure if this helps the diagnosis, but here's dspam_admin list 
preference default output that shows the change you suggested to force 
signatureLocation into the header.

$ sudo dspam_admin list preference default
signatureLocation=headers

Thanks,
-Terry



------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] training time?

Reply via email to