[Dspam-user] Delivering emails twice

Christoph Pleger Mon, 26 Dec 2011 04:45:24 -0800

Hello,

when dspam does not classify an email as spam, the email is delivered to my 
INBOX and the message contains for example the following dspam headers:


X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sun Dec 25 20:43:32 2011
X-DSPAM-Confidence: 0.9899
X-DSPAM-Probability: 0.0000
X-DSPAM-Signature: 4ef77ce45055967287547

If the email has not been correctly classified, I move it to my Spam folder. 
This, with help of a cron job and some magic in my imap server, automatically 
lets dspam retrain the email as spam. But when retraining, something is 
happening which seems strange to me: The email is delivered again, this time 
with these dspam headers:

X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sun Dec 25 20:43:32 2011
X-DSPAM-Confidence: 0.9899
X-DSPAM-Probability: 0.0000
X-DSPAM-Signature: 4ef77ce45055967287547
X-DSPAM-Result: Spam
X-DSPAM-Processed: Sun Dec 25 22:15:02 2011
X-DSPAM-Confidence: 0.4840
X-DSPAM-Probability: 1.0000
X-DSPAM-Signature: 4ef79256273581804284693

That is, the old dspam headers remain unchanged and additional dspam headers 
are appended.

Is this behaviour (delivering the email twice and adding additional headers 
the second time) normal or is that an error in my configuraration (see 
attachment)? I am using dspam 3.6.8.

Regards
  Christoph

## $Id: dspam.conf.in,v 1.72 2006/05/14 15:40:42 jonz Exp $
## dspam.conf -- DSPAM configuration file
##

#
# DSPAM Home: Specifies the base directory to be used for DSPAM storage
#
Home /var/spool/dspam

#
# StorageDriver: Specifies the storage driver backend (library) to use.
# You'll only need to set this if you are using dynamic storage driver plugins.
# The default when one storage driver is specified is to statically link. Be 
# sure to include the path to the library if necessary, and some systems may 
# use an extension other than .so.
#
# Options include:
#
#   libmysql_drv.so     libpgsql_drv.so   libsqlite_drv.so
#   libsqlite3_drv.so   libora_drv.so     libhash_drv.so
#
# IMPORTANT: Switching storage drivers requires more than merely changing
# this option. If you do not wish to lose all of your data, you will need to
# migrate it to the new backend before making this change.
#
StorageDriver /usr/lib/dspam/libmysql_drv.so

#
# SMTP or LMTP Delivery: Alternatively, you may wish to use SMTP or LMTP 
# delivery to deliver your message to the mail server. You will need to 
# configure with --enable-daemon to use host delivery, however you do not need 
# to operate in daemon mode. Specify an IP address or UNIX path to a domain 
# socket below as a host.
#
# If you would like to set up DeliveryHost's on a per-domain basis, use
# the syntax: DeliveryHost.domain.com 1.2.3.4
#
DeliveryHost        /var/run/dovecot/lmtp 
DeliveryIdent       localhost
DeliveryProto       LMTP

#
# OnFail: What to do if local delivery or quarantine should fail. If set
# to "unlearn", DSPAM will unlearn the message prior to exiting with an
# un successful return code. The default option, "error" will not unlearn
# the message but return the appropriate error code. The unlearn option
# is use-ful on some systems where local delivery failures will cause the
# message to be requeued for delivery, and could result in the message
# being processed multiple times. During a very large failure, however, 
# this could cause a significant load increase.
#
OnFail error

# Trusted Users: Only the users specified below will be allowed to perform
# administrative functions in DSPAM such as setting the active user and
# accessing tools. All other users attempting to run DSPAM will be restricted;
# their uids will be forced to match the active username and they will not be
# able to specify delivery agent privileges or use tools.
#
Trust root
Trust dspam
Trust mail
Trust daemon

#
# Debugging: Enables debugging for some or all users. IMPORTANT: DSPAM must
# be compiled with debug support in order to use this option. DSPAM should
# never be running in production with debug active unless you are 
# troubleshooting problems.
#
# DebugOpt: One or more of: process, classify, spam, fp, inoculation, corpus
#   process     standard message processing
#   classify    message classification using --classify
#   spam        error correction of missed spam
#   fp          error correction of false positives
#   inoculation message inoculations (source=inoculation)
#   corpus      corpusfed messages (source=corpus)
#
#Debug *
#Debug bob bill
#
#DebugOpt process spam fp

#
# Training Mode: The default training mode to use for all operations, when
# one has not been specified on the commandline or in the user's preferences.
# Acceptable values are: toe, tum, teft, notrain
#
TrainingMode teft

#
# TestConditionalTraining: By default, dspam will retrain certain errors
# until the condition is no longer met. This usually accelerates learning.
# Some people argue that this can increase the risk of errors, however.
#
TestConditionalTraining on

#
# Features: Specify features to activate by default; can also be specified
# on the commandline. See the documentation for a list of available features.
# If _any_ features are specified on the commandline, these are ignored.
#
# NOTE: For standard "CRM114" Markovian weighting, use sbph
#
#Feature sbph
Feature noise
Feature chained
Feature whitelist

# Training Buffer: The training buffer waters down statistics during training.
# It is designed to prevent false positives, but can also dramatically reduce
# dspam's catch rate during initial training. This can be a number from 0
# (no buffering) to 10 (maximum buffering). If you are paranoid about false
# positives, you should probably enable this option.
#Feature tb=5

#
# Algorithms: Specify the statistical algorithms to use, overriding any
# defaults configured in the build. The options are:
#    naive       Naive-Bayesian (All Tokens)
#    graham      Graham-Bayesian ("A Plan for Spam")
#    burton      Burton-Bayesian (SpamProbe)
#    robinson    Robinson's Geometric Mean Test (Obsolete)
#    chi-square  Fisher-Robinson's Chi-Square Algorithm
#
# You may have multiple algorithms active simultaneously, but it is strongly
# recommended that you group Bayesian algorithms with other Bayesian
# algorithms, and any use of Chi-Square remain exclusive.
#
# NOTE: For standard "CRM114" Markovian weighting, use 'naive', or consider
#       using 'burton' for slightly better accuracy
#
# Don't mess with this unless you know what you're doing
#
#Algorithm chi-square
#Algorithm naive
Algorithm burton graham

#
# PValue: Specify the technique used for calculating PValues, overriding any
# defaults configured in the build. These options are:
#    graham      Graham's Technique ("A Plan for Spam")
#    robinson    Robinson's Technique 
#    markov      Markovian Weighted Technique
#
# Unlike algorithms, you may only have one of these defined. Use of the
# chi-square algorithm automatically changes this to robinson.
#
# Don't mess with this unless you know what you're doing.
#
#PValue robinson
#PValue markov
PValue graham

#
# SupressWebStats: Enable this if you are not using the CGI, and don't want
# .stats files written.
SupressWebStats on

#
# ImprobabilityDrive: Calculate odds-ratios for ham/spam, and add to
# X-DSPAM-Improbability headers
#ImprobabilityDrive on

#
# Preferences: Specify any preferences to set by default, unless otherwise
# overridden by the user (see next section) or a default.prefs file.
# If user or default.prefs are found, the user's preferences will override any
# defaults.
#
Preference "spamAction=deliver"
Preference "signatureLocation=headers"  # 'message' or 'headers'
Preference "showFactors=off"

#
# Overrides: Specifies the user preferences which may override configuration
# and commandline defaults. Any other preferences supplied by an untrusted user
# will be ignored.
#
#AllowOverride trainingMode
#AllowOverride spamAction spamSubject
#AllowOverride statisticalSedation
#AllowOverride enableBNR
#AllowOverride enableWhitelist
#AllowOverride signatureLocation
#AllowOverride showFactors
#AllowOverride optIn optOut
#AllowOverride whitelistThreshold

# --- Hash ---

# HashRecMax: Default number of records to create in the initial segment when
# building hash files. 100,000 yields files 1.6MB in size, but can fill up
# fast, so be sure to increase this (to a million or more) if you're not using
# autoextend.
#
# Primes List:
#  53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 196613,
#  393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843, 50331653, 
#  100663319, 201326611, 402653189, 805306457, 1610612741, 3221225473, 
#  4294967291
#
HashRecMax              98317

# HashAutoExtend: Autoextend hash databases when they fill up. This allows
# them to continue to train by adding extents (extensions) to the file. There 
# will be a small delay during the growth process, as everything needs to be 
# closed and remapped. 
#
HashAutoExtend          on  

# HashMaxExtents: The maximum number of extents that may be created in a single
# hash file. Set this to zero for unlimited
#
HashMaxExtents          0

# HashExtentSize: The record size for newly created extents. Creating this too
# small could result in many extents being created. Creating this too large
# could result in excessive disk space usage.
#
HashExtentSize          49157

# HashMaxSeek: The maximum number of records to seek to insert a new record
# before failing or adding a new extent. Setting this too high will exhaustively
# scan each segment and kill performance. Typically, a low value is acceptable
# as even older extents will continue to fill over time.
#
HashMaxSeek             100

# HashConcurrentUser: If you are using a single, stateful hash database in
# daemon mode, specifying a concurrent user will cause the user to be 
# permanently mapped into memory and shared via rwlocks.
#
#HashConcurrentUser     user

# HashConnectionCache: If running in daemon mode, this is the max # of
# concurrent connections that will be supported. NOTE: If you are using
# HashConcurrentUser, this option is ignored, as all connections are read-
# write locked instead of mutex locked.
HashConnectionCache     1000

#
# Ignored headers: If DSPAM is behind other tools which may add a header to
# incoming emails, it may be beneficial to ignore these headers - especially
# if they are coming from another spam filter. If you are _not_ using one of
# these tools, however, leaving the appropriate headers commented out will
# allow DSPAM to use them as telltale signs of forged email.
#
#IgnoreHeader X-Spam-Status
#IgnoreHeader X-Spam-Scanned
IgnoreHeader X-Virus-Scanner-Result
IgnoreHeader X-Virus-Status

#
# Notifications: Enable the sending of notification emails to users (first
# message, quarantine full, etc.)
#
Notifications   off

#
# Purge configuration: Set dspam_clean purge default options, if not otherwise
# specified on the commandline
#
PurgeSignatures 14          # Stale signatures
PurgeNeutral    90          # Tokens with neutralish probabilities
PurgeUnused     90          # Unused tokens
PurgeHapaxes    30          # Tokens with less than 5 hits (hapaxes)
PurgeHits1S     15          # Tokens with only 1 spam hit
PurgeHits1I     15          # Tokens with only 1 innocent hit

#
# Local Mail Exchangers: Used for source address tracking, tells DSPAM which
# mail exchangers are local and therefore should be ignored in the Received:
# header when tracking the source of an email. Note: you should use the address
# of the host as appears between brackets [ ] in the Received header.
#
LocalMX 127.0.0.1

#
# Logging: Disabling logging for users will make usage graphs unavailable to
# them. Disabling system logging will make admin graphs unavailable.
#
SystemLog off
UserLog   off

#
# TrainPristine: for systems where the original message remains server side 
# and can therefore be presented in pristine format for retraining. This option
# will cause DSPAM to cease all writing of signatures and DSPAM headers to the 
# message, and deliver the message in as pristine format as possible. This mode
# REQUIRES that the original message in its pristine format (as of delivery) 
# be presented for retraining, as in the case of webmail, imap, or other 
# applications where the message is actually kept server-side during reading, 
# and is preserved. DO NOT use this switch unless the original message can be 
# presented for retraining with the ORIGINAL HEADERS and NO MODIFICATIONS.
#
TrainPristine off

#
# Opt: in or out; determines DSPAM's default filtering behavior. If this value
# is set to in, users must opt-in to filtering by dropping a .dspam file in
# /var/dspam/opt-in/user.dspam (or if you have homedirs configured, a .dspam
# folder in their home directory).  The default is opt-out, which means all 
# users will be filtered unless a .nodspam file is dropped in 
# /var/dspam/opt-out/user.nodspam
#
Opt out 

#
# ParseToHeaders: In lieu of setting up individual aliases for each user,
# DSPAM can be configured to automatically parse the To: address for spam and
# false positive forwards. From there, it can be configured to either set the
# DSPAM user based on the username specified in the header and/or change the
# training class and source accordingly. The options below can be used to 
# customize most common types of header parsing behavior to avoid the need for
# multiple aliases, or if using LMTP, aliases entirely..
#
# ParseToHeader: Parse the To: headers of an incoming message. This must be
#                set to 'on' to use either of the following features.
# 
# ChangeModeOnParse: Automatically change the class (to spam or innocent)
#   depending on whether spam- or notspam- was specified, and change the source
#   to 'error'. This is convenient if you're not using aliases at all, but
#   are delivering via LMTP.
#
# ChangeUserOnParse: Automatically change the username to match that specified
#   in the To: header. For example, spam-...@domain.tld will set the username
#   to bob, ignoring any --user passed in. This may not always be desirable if
#   you are using virtual email addresses as usernames. Options:
#     on or user        take the portion before the @ sign only
#     full              take everything after the initial {spam,notspam}-.
#
#ParseToHeaders on
#ChangeModeOnParse on
#ChangeUserOnParse on

#
# Broken MTA Options: Some MTAs don't support the proper functionality
# necessary. In these cases you can activate certain features in DSPAM to
# compensate. 'returnCodes' causes DSPAM to return an exit code of 99 if
# the message is spam, 0 if not, or a negative code if an error has occured.
# Specifying 'case' causes DSPAM to force the input usernames to lowercase.
# Spceifying 'lineStripping' causes DSPAM to strip ^M's from messages passed
# in.
#
#Broken returnCodes
#Broken case
#Broken lineStripping

#
# MaxMessageSize: You may specify a maximum message size for DSPAM to process.
# If the message is larger than the maximum size, it will be delivered 
# without processing. Value is in bytes.
#
#MaxMessageSize 4194304

# If you wish to use a local domain socket instead of a TCP socket, uncomment
# the following. It is strongly recommended you use local domain sockets if
# you are running the client and server on the same machine, as it eliminates
# much of the bandwidth overhead.
#
ServerDomainSocketPath  "/var/run/dspam/dspam.sock"

#
# ServerMode specifies the type of LMTP server to start. This can be one of:
#     dspam: DSPAM-proprietary DLMTP server, for communicating with dspamc
#  standard: Standard LMTP server, for communicating with Postfix or other MTA
#      auto: Speak both DLMTP and LMTP; auto-detect by ServerPass.IDENT
#
ServerMode standard

# If supporting standard LMTP mode, server parameters will need to be specified
# here, as they will not be passed in by the mail server. The ServerIdent
# specifies the 250 response code ident sent back to connecting clients and
# should be set to the hostname of your server, or an alias.
#
# NOTE: If you specify --user in ServerParameters, the RCPT TO will be
#       used only for delivery, and not set as the active user for processing.
#
ServerParameters        "--deliver=innocent,spam"
ServerIdent             "joseph.pleger.local"

# ProcessorWordFrequency: By default, words are only counted once per message.
# If you are classifying large documents, however, you may wish to count once
# per occurrence instead.
#
#ProcessorWordFrequency  occurrence

# ProcessorBias: Bias causes the filter to lean more toward 'innocent', and
# usually greatly reduces false positives. It is the default behavior of
# most Bayesian filters (including dspam). 
#
# NOTE: You probably DONT want this if you're using Markovian Weighting, unless
# you are paranoid about false positives.
#
ProcessorBias on

# Include a directory with configuration items.
Include /etc/dspam/dspam.d/

## EOF

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev

_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

[Dspam-user] Delivering emails twice

Reply via email to