[Dspam-user] Upgrading from 3.8.0 to 3.10.1 greatly detection quality

René Neumann Fri, 30 Mar 2012 02:39:40 -0700

Hi *,

I upgraded my dspam installation from 3.8.0 to 3.10.1 several weeks ago.
And I had to notice, that the detection quality suddenly dropped a lot:
Each day I have to remove several mails from Junk because they are seen
as spam -- this didn't happen in the old installation and it also does
not change. Was it a mistake to use the old database? Would it be wise
to drop the old data and retrain?


Also I am a bit puzzled about the new configuration: Several options now
appear twice in the conffile: One time as a normal option and one time
as a 'Preference' parameter. It is not clear to me what takes precedence
or what happens if one of them is not set. Perhaps this influences the
problem above, as I might have conflicting options set this way. (Why
are these 'Preference' parameters there anyway?)

Thanks,
René

P.S.: I attached my current config :)

## $Id: dspam.conf.in,v 1.100 2011/07/09 00:00:52 sbajic Exp $
## dspam.conf -- DSPAM configuration file
##

#
# DSPAM Home: Specifies the base directory to be used for DSPAM storage
#
Home /var/spool/dspam

#
# StorageDriver: Specifies the storage driver backend (library) to use.
# You'll only need to set this if you are using dynamic storage driver plugins
# from a binary distribution. The default build statically links the storage
# driver (when only one is specified at configure time), overriding this
# setting, which only comes into play if multiple storage drivers are specified
# at configure time. When using dynamic linking, be sure to include the path
# to the library if necessary, and some systems may use an extension other
# than .so (e.g. OSX uses .dylib).
#
# Options include:
#
#   libmysql_drv.so     libpgsql_drv.so   libsqlite_drv.so
#   libsqlite3_drv.so   libhash_drv.so
#
# IMPORTANT: Switching storage drivers requires more than merely changing
# this option. If you do not wish to lose all of your data, you will need to
# migrate it to the new backend before making this change.
#
StorageDriver /usr/lib64/dspam/libmysql_drv.so

#
# Trusted Delivery Agent: Specifies the local delivery agent DSPAM should call
# when delivering mail as a trusted user. Use %u to specify the user DSPAM is
# processing mail for. It is generally a good idea to allow the MTA to specify
# the pass-through arguments at run-time, but they may also be specified here.
#
# Most operating system defaults:
#TrustedDeliveryAgent "/usr/bin/procmail"       # Linux
#TrustedDeliveryAgent "/usr/bin/mail"           # Solaris
#TrustedDeliveryAgent "/usr/libexec/mail.local" # FreeBSD
#TrustedDeliveryAgent "/usr/bin/procmail"       # Cygwin
#
# Other popular configurations:
#TrustedDeliveryAgent "/usr/cyrus/bin/deliver"  # Cyrus
#TrustedDeliveryAgent "/bin/maildrop"           # Maildrop
#TrustedDeliveryAgent "/usr/local/sbin/exim -oMr spam-scanned" # Exim
#
#TrustedDeliveryAgent "/usr/bin/procmail"

#
# Untrusted Delivery Agent: Specifies the local delivery agent and arguments
# DSPAM should use when delivering mail and running in untrusted user mode.
# Because DSPAM will not allow pass-through arguments to be specified to
# untrusted users, all arguments should be specified here. Use %u to specify
# the user DSPAM is processing mail for. This configuration parameter is only
# necessary if you plan on allowing untrusted processing.
#
#UntrustedDeliveryAgent "/usr/bin/procmail -d %u"

#
# SMTP or LMTP Delivery: Alternatively, you may wish to use SMTP or LMTP
# delivery to deliver your message to the mail server instead of using a
# delivery agent. You will need to configure with --enable-daemon to use host
# delivery, however you do not need to operate in daemon mode. Specify an IP
# address or UNIX path to a domain socket below as a host.
#
# If you would like to set up DeliveryHost's on a per-domain basis, use
# the syntax: DeliveryHost.domain.com 1.2.3.4
#
DeliveryHost        127.0.0.1
DeliveryPort        10025
DeliveryIdent       localhost
DeliveryProto       SMTP

#
# FallbackDomains: If you want to specify certain domains as fallback domains,
# enable this option. For example, you could create a user @domain.com, and
# if b...@domain.com does not resolve to a known user on the system, the user
# could default to your @domain.com user. NOTE: This also requires designating
# fallbackDomain for the domain name;
# e.g. dspam_admin ch pref domain.com fallbackDomain on
#
#FallbackDomains on

#
# Quarantine Agent: DSPAM's default behavior is to quarantine all mail it
# thinks is spam. If you wish to override this behavior, you may specify
# a quarantine agent which will be called with all messages DSPAM thinks is
# spam. Use %u to specify the user DSPAM is processing mail for.
#
#QuarantineAgent        "/usr/bin/procmail -d spam"

#
# DSPAM can optionally process "plused users" (addresses in the user+detail
# form) by truncating the username just before the "+", so all internal
# processing occurs for "user", but delivery will be performed for
# "user+detail". This is only useful if the LDA can handle "plused users"
# (for example Cyrus IMAP) and when configured for LMTP delivery above
#
#EnablePlusedDetail     on

#
# Character to use as seperator between user names and address extensions.
# If you change this value then please adjust QuarantineMailbox to use the
# new specified character. The default is '+'.
#
#PlusedCharacter        +

#
# Turn this feature on if you want to force DSPAM to lowercase the "plused
# users" username.
#
#PlusedUserLowercase    on

#
# Quarantine Mailbox: DSPAM's LMTP code can send spam mail using LMTP to a
# "plused" mailbox (such as user+quarantine) leaving quarantine processing
# for retraining or deletion to be performed by the LDA and the mail client.
# "plused" mailboxes are supported by Cyrus IMAP and possibly other LDAs. If
# you don't set/change PlusedCharacter then the mailbox name must have the +
# since the + is the default used character.
#
#QuarantineMailbox      +quarantine

#
# OnFail: What to do if local delivery or quarantine should fail. If set
# to "unlearn", DSPAM will unlearn the message prior to exiting with an
# un successful return code. The default option, "error" will not unlearn
# the message but return the appropriate error code. The unlearn option
# is use-ful on some systems where local delivery failures will cause the
# message to be requeued for delivery, and could result in the message
# being processed multiple times. During a very large failure, however,
# this could cause a significant load increase.
#
OnFail error

#
# Trusted Users: Only the users specified below will be allowed to perform
# administrative functions in DSPAM such as setting the active user and
# accessing tools. All other users attempting to run DSPAM will be restricted;
# their uids will be forced to match the active username and they will not be
# able to specify delivery agent privileges or use tools.
#
Trust root
Trust dspam
#Trust apache
#Trust mail
#Trust mailnull 
#iTrust smmsp
#Trust daemon
Trust vmail
#Trust nobody
#Trust majordomo

#
# Debugging: Enables debugging for some or all users. IMPORTANT: DSPAM must
# be compiled with debug support in order to use this option. DSPAM should
# never be running in production with debug active unless you are
# troubleshooting problems.
#
# DebugOpt: One or more of: process, classify, spam, fp, inoculation, corpus
#   process     standard message processing
#   classify    message classification using --classify
#   spam        error correction of missed spam
#   fp          error correction of false positives
#   inoculation message inoculations (source=inoculation)
#   corpus      corpusfed messages (source=corpus)
#
#Debug *
#Debug bob bill
#
#DebugOpt process spam fp

#
# ClassAlias: Alias a particular class to spam/nonspam. This is useful if
# classifying things other than spam.
#
#ClassAliasSpam badstuff
#ClassAliasNonspam goodstuff

#
# Training Mode: The default training mode to use for all operations, when
# one has not been specified on the commandline or in the user's preferences.
# Acceptable values are:
#     toe     Train on Error (Only)
#     teft    Train Everything (Trains on every message)
#     tum     Train Until Mature (Train only tokens without enough data)
#     notrain Do not train or store signatures (large ISP systems, post-train)
#
TrainingMode teft

#
# TestConditionalTraining: By default, dspam will retrain certain errors
# until the condition is no longer met. This usually accelerates learning.
# Some people argue that this can increase the risk of errors, however.
#
TestConditionalTraining on

#
# Features: Specify features to activate by default; can also be specified
# on the commandline. See the documentation for a list of available features.
# If _any_ features are specified on the commandline, these are ignored.
#
#Feature noise
Feature whitelist

# Training Buffer: The training buffer waters down statistics during training.
# It is designed to prevent false positives, but can also dramatically reduce
# dspam's catch rate during initial training. This can be a number from 0
# (no buffering) to 10 (maximum buffering). If you are paranoid about false
# positives, you should probably enable this option.
#
#Feature tb=5

#
# Algorithms: Specify the statistical algorithms to use, overriding any
# defaults configured in the build. The options are:
#    naive       Naive-Bayesian (All Tokens)
#    graham      Graham-Bayesian ("A Plan for Spam")
#    burton      Burton-Bayesian (SpamProbe)
#    robinson    Robinson's Geometric Mean Test (Obsolete)
#    chi-square  Fisher-Robinson's Chi-Square Algorithm
#
# You may have multiple algorithms active simultaneously, but it is strongly
# recommended that you group Bayesian algorithms with other Bayesian
# algorithms, and any use of Chi-Square remain exclusive.
#
# NOTE: For standard "CRM114" Markovian weighting, use 'naive', or consider
#       using 'burton' for slightly better accuracy
#
# Don't mess with this unless you know what you're doing
#
#Algorithm chi-square
#Algorithm naive
Algorithm graham burton

#
# Tokenizer: Specify the tokenizer to use. The tokenizer is the piece
# responsible for parsing the message into individual tokens. Depending on
# how many resources you are willing to trade off vs. accuracy, you may
# choose to use a less or more detailed tokenizer:
#   word    uniGram (single word) tokenizer
#           Tokenizes message into single individual words/tokens
#           example: "free" and "viagra"
#   chain   biGram (chained tokens) tokenizer (default)
#           Single words + chains adjacent tokens together
#           example: "free" and "viagra" and "free viagra"
#   sbph    Sparse Binary Polynomial Hashing tokenizer
#           Creates sparse token patterns across sliding window of 5-tokens
#           example: "the quick * fox jumped" and "the * * fox jumped"
#   osb     Orthogonal Sparse biGram tokenizer
#           Similar to SBPH, but only uses the biGrams
#           example: "the * * fox" and "the * * * jumped"
#
# In general the reccomendation is to use 'osb' for new installations.
# The default value of 'chain' remains here as not to surprise anyone upgrading
# that has not changed from the default value.
#
Tokenizer chain

#
# PValue: Specify the technique used for calculating Probability Values,
# overriding any defaults configured in the build. These options are:
#    bcr         Bayesian Chain Rule (Graham's Technique - "A Plan for Spam")
#    robinson    Robinson's Technique (used in Chi-Square)
#    markov      Markovian Weighted Technique (for Markovian discrimination)
#
# Unlike the "Algorithms" property, you may only have one of these defined.
# Use of the chi-square algorithm automatically changes this to robinson.
#
# Don't mess with this unless you know what you're doing.
#
#PValue robinson
#PValue markov
PValue bcr

#
# WebStats: Enable this if you are using the CGI, which writes .stats files
WebStats off

#
# ImprobabilityDrive: Calculate odds-ratios for ham/spam, and add to
# X-DSPAM-Improbability headers
#
#ImprobabilityDrive on

#
# Preferences: Specify any preferences to set by default, unless otherwise
# overridden by the user (see next section) or a default.prefs file.
# If user or default.prefs are found, the user's preferences will override any
# defaults.
#
Preference "trainingMode=TEFT"          # { TOE | TUM | TEFT | NOTRAIN } -> 
default:teft
Preference "spamAction=deliver" # { quarantine | tag | deliver } -> 
default:quarantine
Preference "spamSubject=[SPAM]"         # { string } -> default:[SPAM]
Preference "statisticalSedation=0"      # { 0 - 10 } -> default:0
Preference "enableBNR=off"              # { on | off } -> default:off
Preference "enableWhitelist=on"         # { on | off } -> default:on
Preference "signatureLocation=headers"  # { message | headers } -> 
default:message
Preference "tagSpam=off"                # { on | off }
Preference "tagNonspam=off"             # { on | off }
Preference "showFactors=off"            # { on | off } -> default:off
#Preference "optIn=off"                 # { on | off }
#Preference "optOut=on"                 # { on | off }
Preference "whitelistThreshold=10"      # { Integer } -> default:10
Preference "makeCorpus=off"             # { on | off } -> default:off
Preference "storeFragments=off"         # { on | off } -> default:off
Preference "localStore="                # { on | off } -> default:username
Preference "processorBias=on"           # { on | off } -> default:on
Preference "fallbackDomain=off"         # { on | off } -> default:off
Preference "trainPristine=off"          # { on | off } -> default:off
Preference "optOutClamAV=off"           # { on | off } -> default:off
Preference "ignoreRBLLookups=off"       # { on | off } -> default:off
Preference "RBLInoculate=off"           # { on | off } -> default:off
Preference "notifications=off"          # { on | off } -> default:off

#
# Overrides: Specifies the user preferences which may override configuration
# and commandline defaults. Any other preferences supplied by an untrusted user
# will be ignored.
#
AllowOverride enableBNR
AllowOverride showFactors
AllowOverride statisticalSedation
AllowOverride trainingMode

# --- MySQL ---

#
# Storage driver settings: Specific to a particular storage driver. Uncomment
# the configuration specific to your installation, if applicable.
#
MySQLServer     /var/run/mysqld/mysqld.sock
#MySQLPort
MySQLUser               dspam
MySQLDb                 dspam
MySQLCompress           false
MySQLReconnect          true

# If you are using replication for clustering, you can also specify a separate
# server to perform all writes to.
#
#MySQLWriteServer       /var/lib/mysql/mysql.sock
#MySQLWritePort         
#MySQLWriteUser         dspam
#MySQLWritePass         changeme
#MySQLWriteDb           dspam_write
MySQLCompress           false
#MySQLReconnect         true

# If your replication isn't close to real-time, your retraining might fail if
# the  signature isn't found. One workaround for this is to use the write
# database for all signature reads:
#
#MySQLReadSignaturesFromWriteDb on

# If you're running DSPAM in client/server (daemon) mode, uncomment the
# setting below to override the default connection cache size (the number
# of connections the server pools between all clients). The connection cache
# represents the maximum number of database connections *available* and should
# be set based on the maximum number of concurrent connections you're likely
# to have. Each connection may be used by only one thread at a time, so all
# other threads _will block_ until another connection becomes available.
#
#MySQLConnectionCache   10

# If you're using vpopmail or some other type of virtual setup and wish to
# change the table dspam uses to perform username/uid lookups, you can over-
# ride it below

#MySQLVirtualTable              dspam_virtual_uids
#MySQLVirtualUIDField           uid
#MySQLVirtualUsernameField      username

# UIDInSignature: MySQL supports the insertion of the user id into the DSPAM
# signature. This allows you to create one single spam or fp alias
# (pointing to some arbitrary user), and the uid in the signature will
# switch to the correct user. Result: you need only one spam alias

MySQLUIDInSignature    on

# --- PostgreSQL ---

# For PgSQLServer you can Use a TCP/IP address or a socket. If your socket is
# in /var/run/postgresql/.s.PGSQL.5432 specify just the path where the socket
# resits (without .s.PGSQL.5432).

#PgSQLServer            /var/run/postgresql/
#PgSQLPort              
#PgSQLUser              dspam
#PgSQLPass              changeme
#PgSQLDb                dspam

# If you're running DSPAM in client/server (daemon) mode, uncomment the
# setting below to override the default connection cache size (the number
# of connections the server pools between all clients).
#
#PgSQLConnectionCache   3

# UIDInSignature: PgSQL supports the insertion of the user id into the DSPAM
# signature. This allows you to create one single spam or fp alias
# (pointing to some arbitrary user), and the uid in the signature will
# switch to the correct user. Result: you need only one spam alias

#PgSQLUIDInSignature    on

# If you're using vpopmail or some other type of virtual setup and wish to
# change the table dspam uses to perform username/uid lookups, you can over-
# ride it below

#PgSQLVirtualTable              dspam_virtual_uids
#PgSQLVirtualUIDField           uid
#PgSQLVirtualUsernameField      username

# --- SQLite ---

#SQLitePragma           "synchronous = OFF"

# --- Hash ---

#
# HashRecMax: Default number of records to create in the initial segment when
# building hash files. 100,000 yields files 1.6MB in size, but can fill up
# fast, so be sure to increase this (to a million or more) if you're not using
# autoextend.
#
# NOTE: If you're using a heavy-weight tokenizer, such as SBPH, you should be
#       looking for settings in the 'millions' of records.
#
# Primes List:
#  53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 196613,
#  393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843, 50331653,
#  100663319, 201326611, 402653189, 805306457, 1610612741, 3221225473,
#  4294967291
#
HashRecMax              98317

#
# HashAutoExtend: Autoextend hash databases when they fill up. This allows
# them to continue to train by adding extents (extensions) to the file. There
# will be a small delay during the growth process, as everything needs to be
# closed and remapped.
#
HashAutoExtend          on

#
# HashMaxExtents: The maximum number of extents that may be created in a single
# hash file. Set this to zero for unlimited
#
HashMaxExtents          0

#
# HashExtentSize: The initial record size for newly created extents. Creating
# this too small could result in many extents being created. Creating this too
# large could result in excessive disk space usage. Typically, a value close
# to half of the HashRecMax size is good.
#
HashExtentSize          49157

#
# HashPctIncrease: Increase the next extent size by n% from the size of the
# last extent. This is useful in accommodating systems where the default
# HashExtentSize can be too small for certain high-volume users, and can also
# help keep seeks nice and speedy and/or prevent too many unnecessary extents
# from being created when using a low HashMaxSeek. The default behavior, when
# HashPctIncrease is not used, is to always use # HashExtentSize with no
# increase.
#
HashPctIncrease         10

#
# HashMaxSeek: The maximum number of record seeks when inserting a new record
# before failing or adding a new extent. This ultimately translates into the
# max # of acceptable seeks per segment. Setting this too high will exhaustively
# scan each segment and hurt performance. Typically, a low value is acceptable
# as even older extents will continue to fill as training progresses.
#
HashMaxSeek             10

#
# HashConcurrentUser: If you are using a single, stateful hash database in
# daemon mode, specifying a concurrent user below will cause the user to be
# permanently mapped into memory and shared via rwlocks. This is very fast and
# very cool if you are running a "userless" relay appliance.
#
#HashConcurrentUser     user

#
# HashConnectionCache: If running in daemon mode, this is the max # of
# concurrent connections that will be supported. NOTE: If you are using
# HashConcurrentUser, this option is ignored, as all connections are read-
# write locked instead of mutex locked.
#
HashConnectionCache     10


# --- ExtLookup ---

# ExtLookup: Perform various external lookup functions depending on user-
# defined variables. ExtLookup can either be set to 'on' or 'off'. The
# behavior of such lookups are defined by the use of ExtLookupMode, which
# can be set to 'verify', 'map' and 'strict'.
#
#  verify   Will cause dspam to validate the user, prior to
#           creating the user entry in the system.
#
#  map      Will cause dspam to try to map the user address
#           to a certain unique identifier.
#
#  strict   Will cause dspam to enforce both 'verify' and 'map'.
#
# ExtLookupDriver will set the engine behind the lookups. For now the only
# supported mechanisms are 'ldap' and 'program'. The first will make dspam
# talk directly to the configured LDAP server. The second will prefrom the
# various lookup functions by running a certain binary program or executable
# script. The program MUST be a binary executable or a script with a well
# defined interperter in its first line ( #!/path/to/interpreter ). There
# are plans to support TLS/SSL connections to backend databases.
#
#ExtLookup              on                              # Turns on/off external 
lookup
#ExtLookupMode          strict                          # available modes are 
'verify', 'map' and 'strict'.
                                                        # 'strict' enforces 
both verify and map
#ExtLookupDriver        ldap                            # Currently only ldap 
and program are supported.
                                                        # There are plans to 
support both MySQL and Postgres.
#ExtLookupServer        ldap.domain.com                 # Can either be a 
database hostname or the full path to
                                                        # an executable lookup 
program and its arguments.
#ExtLookupPort          389                             # Desired port when 
connecting to the lookup database.
#ExtLookupDB            "ou=Users,dc=domain,dc=com"     # Can either be an LDAP 
search base or a database name (TODO).
#ExtLookupQuery         
"(&(objectClass=qmailUser)(|(mail=%u)(mailAlternateAddress=%u)))"       # Can 
either be an LDAP search filter or an SQL query (TODO)
#ExtLookupLDAPAttribute "mail"                          # Attribute to be used 
when ExtLookupDriver is 'ldap'
                                                        # and ExtLookupMode 
'map' or 'strict'
#ExtLookupLDAPScope     sub                             # Can be set to 'base', 
'sub' or 'one'. Only used when ExtLookupDriver is 'ldap'.
#ExtLookupLDAPVersion   3                               # Sets the LDAP 
protocol version (1, 2 or 3)
#ExtLookupLogin         "cn=admin,dc=domain,dc=com"     # Login to be used when 
connecting to any direct database backend.
#ExtLookupPassword      itsasecret                      # Password to use with 
ExtLookupLogin.
#ExtLookupCrypto        tls                             # Sets the use of TLS 
on backend communication (only compatible with LDAPv3)


# --- Profiles ---

#
# You can specify multiple storage profiles, and specify the server to
# use on the commandline with --profile. For example:
#
#Profile DECAlpha
#MySQLServer.DECAlpha   10.0.0.1
#MySQLPort.DECAlpha     3306
#MySQLUser.DECAlpha     dspam
#MySQLPass.DECAlpha     changeme
#MySQLDb.DECAlpha       dspam
#MySQLCompress.DECAlpha true
#MySQLReconnect.DECAlpha        true
#
#Profile Sun420R
#MySQLServer.Sun420R    10.0.0.2
#MySQLPort.Sun420R      3306
#MySQLUser.Sun420R      dspam
#MySQLPass.Sun420R      changeme
#MySQLDb.Sun420R        dspam
#MySQLCompress.Sun420R  false
#MySQLReconnect.Sun420R true
#
#DefaultProfile DECAlpha

#
# If you're using storage profiles, you can set failovers for each profile.
# Of course, if you'll be failing over to another database, that database
# must have the same information as the first. If you're using a global
# database with no training, this should be relatively simple. If you're
# configuring per-user data, however, you'll need to set up some type of
# replication between databases.
#
#Failover.DECAlpha      SUN420R
#Failover.Sun420R       DECAlpha

# If the storage fails, the agent will follow each profile's failover up to
# a maximum number of failover attempts. This should be set to a maximum of
# the number of profiles you have, otherwise the agent could loop and try
# the same profile multiple times (unless this is your desired behavior).
#
#FailoverAttempts       1

#
# Ignored headers: If DSPAM is behind other tools which may add a header to
# incoming emails, it may be beneficial to ignore these headers - especially
# if they are coming from another spam filter. If you are _not_ using one of
# these tools, however, leaving the appropriate headers commented out will
# allow DSPAM to use them as telltale signs of forged email.
#
#IgnoreHeader X-Spam-Status
#IgnoreHeader X-Spam-Scanned
#IgnoreHeader X-Virus-Scanner-Result

#
# Lookup: Perform lookups on streamlined blackhole list servers (see
# http://www.nuclearelephant.com/projects/sbl/). The streamlined blacklist
# server is machine-automated, unsupervised blacklisting system designed to
# provide real-time and highly accurate blacklisting based on network spread.
# When performing a lookup, DSPAM will automatically learn the inbound message
# as spam if the source IP is listed. Until an official public RABL server is
# available, this feature is only useful if you are running your own
# streamlined blackhole list server for internal reporting among multiple mail
# servers. Provide the name of the lookup zone below to use.
#
# This function performs standard reverse-octet.domain lookups, and while it
# will function with many RBLs, it's strongly discouraged to use those
# maintained by humans as they're often inaccurate and could hurt filter
# learning and accuracy.
#
#Lookup         "sbl.yourdomain.com"

#
# RBLInoculate: If you want to inoculate the user from RBL'd messages it would
# have otherwise missed, set this to on.
#
#RBLInoculate   off

#
# Notifications: Enable the sending of notification emails to users (first
# message, quarantine full, etc.)
#
Notifications   off

#
# QuarantineWarnSize: You may specify a size when DSPAM should send a 
"Quarantine
# Full" message to each user. This is only working if you enable notifications
# (see above). Value is in bytes. Default is 2097152 -> 2MB.
#
#QuarantineWarnSize 2097152

#
# Purge configuration: Set dspam_clean purge default options, if not otherwise
# specified on the commandline
#
#PurgeSignatures 14     # Stale signatures
#PurgeNeutral   90      # Tokens with neutralish probabilities
#PurgeUnused    90      # Unused tokens
#PurgeHapaxes   30      # Tokens with less than 5 hits (hapaxes)
#PurgeHits1S    15      # Tokens with only 1 spam hit
#PurgeHits1I    15      # Tokens with only 1 innocent hit

#
# Purge configuration for SQL-based installations using purge.sql
#
PurgeSignature  off # Specified in purge.sql
PurgeNeutral   90
PurgeUnused    off # Specified in purge.sql
PurgeHapaxes   off # Specified in purge.sql
PurgeHits1S    off # Specified in purge.sql
PurgeHits1I    off # Specified in purge.sql

#
# Local Mail Exchangers: Used for source address tracking, tells DSPAM which
# mail exchangers are local and therefore should be ignored in the Received:
# header when tracking the source of an email. Note: you should use the address
# of the host as appears between brackets [ ] in the Received header.
# By default DSPAM is considering the following IPs always as LocalMX:
#       10.0.0.0/8      - Private IP addresses (RFC 1918)
#       127.0.0.0/8     - Localhost Loopback Address (RFC 1700)
#       169.254.0.0/16  - Zeroconf / APIPA (RFC 3330)
#       172.16.0.0/12   - Private IP addresses (RFC 1918)
#       192.168.0.0/16  - Private IP addresses (RFC 1918)
#
LocalMX 127.0.0.1

#
# Logging: Disabling logging for users will make usage graphs unavailable to
# them. Disabling system logging will make admin graphs unavailable.
#
SystemLog       on
UserLog         on

#
# TrainPristine: for systems where the original message remains server side
# and can therefore be presented in pristine format for retraining. This option
# will cause DSPAM to cease all writing of signatures and DSPAM headers to the
# message, and deliver the message in as pristine format as possible. This mode
# REQUIRES that the original message in its pristine format (as of delivery)
# be presented for retraining, as in the case of webmail, imap, or other
# applications where the message is actually kept server-side during reading,
# and is preserved. DO NOT use this switch unless the original message can be
# presented for retraining with the ORIGINAL HEADERS and NO MODIFICATIONS.
#
# NOTE: You can't use this setting with dspam_trian; if you're going to use it,
#       wait until after you train any corpora.
#
#TrainPristine on

#
# Opt: in or out; determines DSPAM's default filtering behavior. If this value
# is set to in, users must opt-in to filtering by dropping a .dspam file in
# /var/dspam/opt-in/user.dspam (or if you have homedirs configured, a .dspam
# folder in their home directory).  The default is opt-out, which means all
# users will be filtered unless a .nodspam file is dropped in
# /var/dspam/opt-out/user.nodspam
#
Opt out

#
# TrackSources: specify which (if any) source addresses to track and report
# them to syslog (mail.info). This is useful if you're running a firewall or
# blacklist and would like to use this information. Spam reporting also drops
# RABL blacklist files (see http://www.nuclearelephant.com/projects/rabl/).
#
#TrackSources spam nonspam virus

#
# ParseToHeaders: In lieu of setting up individual aliases for each user,
# DSPAM can be configured to automatically parse the To: address for spam and
# false positive forwards. From there, it can be configured to either set the
# DSPAM user based on the username specified in the header and/or change the
# training class and source accordingly. The options below can be used to
# customize most common types of header parsing behavior to avoid the need for
# multiple aliases, or if using LMTP, aliases entirely..
#
# ParseToHeader: Parse the To: headers of an incoming message. This must be
#                set to 'on' to use either of the following features.
#
# ChangeModeOnParse: Automatically change the class (to spam or innocent)
#   depending on whether spam- or notspam- was specified, and change the source
#   to 'error'. This is convenient if you're not using aliases at all, but
#   are delivering via LMTP.
#
# ChangeUserOnParse: Automatically change the username to match that specified
#   in the To: header. For example, spam-...@domain.tld will set the username
#   to bob, ignoring any --user passed in. This may not always be desirable if
#   you are using virtual email addresses as usernames. Options:
#     on or user        take the portion before the @ sign only
#     full              take everything after the initial {spam,notspam}-.
#
ParseToHeaders off
ChangeModeOnParse off
ChangeUserOnParse off

#
# Broken MTA Options: Some MTAs don't support the proper functionality
# necessary. In these cases you can activate certain features in DSPAM to
# compensate. 'returnCodes' causes DSPAM to return an exit code of 99 if
# the message is spam, 0 if not, or a negative code if an error has occured.
# Specifying 'case' causes DSPAM to force the input usernames to lowercase.
# Specifying 'lineStripping' causes DSPAM to strip ^M's from messages passed
# in.
#
#Broken returnCodes
#Broken case
#Broken lineStripping

#
# MaxMessageSize: You may specify a maximum message size for DSPAM to process.
# If the message is larger than the maximum size, it will be delivered
# without processing. Value is in bytes.
#
#MaxMessageSize 4194304

# --- ClamAV ---

#
# Virus Checking: If you are running clamd, DSPAM can perform stream-based
# virus checking using TCP. Uncomment the values below to enable virus
# checking.
#
# ClamAVResponse: reject (reject or drop the message with a permanent failure)
#                 accept (accept the message and quietly drop the message)
#                 spam   (treat as spam and quarantine/tag/whatever)
#
#ClamAVPort             3310
#ClamAVHost             127.0.0.1
#ClamAVResponse         accept

# --- CLIENT / SERVER ---

#
# Daemonized Server: If you are running DSPAM as a daemonized server using
# --daemon, the following parameters will override the default. Use the
# ServerPass option to set up accounts for each client machine. The DSPAM
# server will process and deliver the message based on the parameters
# specified. If you want the client machine to perform delivery, use
# the --stdout option in conjunction with a local setup.
#
# ServerHost: Not enabling ServerHost will bind DSPAM server to all available
# interfaces.
#
#ServerHost             127.0.0.1
#ServerPort             24
#ServerQueueSize        32
ServerPID              /var/run/dspam/dspam.pid

#
# ServerMode specifies the type of LMTP server to start. This can be one of:
#     dspam: DSPAM-proprietary DLMTP server, for communicating with dspamc
#  standard: Standard LMTP server, for communicating with Postfix or other MTA
#      auto: Speak both DLMTP and LMTP; auto-detect by ServerPass.IDENT
#
ServerMode auto

# If supporting DLMTP (dspam) mode, dspam clients will require authentication
# as they will be passing in parameters. The idents below will be used to
# determine which clients will be speaking DLMTP, so if you will be using
# both LMTP and DLMTP from the same host, be sure to use something other
# than the server's hostname below (which will be sent by the MTA during a
# standard LMTP LHLO).
# 
#ServerPass.Relay2      "password"

# If supporting standard LMTP mode, server parameters will need to be specified
# here, as they will not be passed in by the mail server. The ServerIdent
# specifies the 250 response code ident sent back to connecting clients and
# should be set to the hostname of your server, or an alias.
#
# NOTE: If you specify --user in ServerParameters, the RCPT TO will be
#       used only for delivery, and not set as the active user for processing.
#
ServerParameters        "--deliver=spam,innocent"
ServerIdent             "localhost.localdomain"

# If you wish to use a local domain socket instead of a TCP socket, uncomment
# the following. It is strongly recommended you use local domain sockets if
# you are running the client and server on the same machine, as it eliminates
# much of the bandwidth overhead.
#
ServerDomainSocketPath  "/var/run/dspam/dspam.sock"

#
# Client Mode: If you are running DSPAM in client/server mode, uncomment and
# set these variables. A ClientHost beginning with a / will be treated as
# a domain socket.
#
ClientHost      "/var/run/dspam/dspam.sock"
ClientIdent     "dspamrelay@Relay1"
#
#ClientHost     127.0.0.1
#ClientPort     24
#ClientIdent    "secret@Relay1"

# --- RABL ---

# RABLQueue: Touch files in the RABL queue
# If you are a reporting streamlined blackhole list participant, you can
# touch ip addresses within the directory the rabl_client process is watching.
#
#RABLQueue      /var/spool/rabl

# ---  ---

# DataSource: If you are using any type of data source that does not include
# email-like headers (such as documents), uncomment the line below. This
# will cause the entire input to be treated like a message "body"
#
#DataSource document

# ProcessorWordFrequency: By default, words are only counted once per message.
# If you are classifying large documents, however, you may wish to count once
# per occurrence instead.
#
#ProcessorWordFrequency occurrence

# ProcessorURLContext: By default, a URL context is generated for URLs, which
# records their tokens as separate from words found in documents. To use
# URL tokens in the same context as words, turn this feature off.
#
ProcessorURLContext on

# ProcessorBias: Bias causes the filter to lean more toward 'innocent', and
# usually greatly reduces false positives. It is the default behavior of
# most Bayesian filters (including dspam).
#
# NOTE: You probably DONT want this if you're using Markovian Weighting, unless
# you are paranoid about false positives.
#
ProcessorBias on

# StripRcptDomain: Cut the domain (including the at sign) from recipients.
# This is particularly useful if the recipient name is equal to real user
# accounts as recipients with domains tend to cause permission issues with
# dspam-web.
#
StripRcptDomain off

# --- Split Configuration File Support ---

# Include a directory with configuration items.
#Include /etc/dspam/dspam.d/

# ---  ---

## EOF

signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure

_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

[Dspam-user] Upgrading from 3.8.0 to 3.10.1 greatly detection quality

Reply via email to