[Dspam-user] Crashes in 3.9.0-ALPHA2 when retrainning

Paulo J. S. Silva Thu, 25 Jun 2009 04:34:10 -0700

Hello,

I tried do upgrade to the new release. I use it in a simple install, I
am the only user of the system and compile it myself. I am using the
hasdriver storage. In the end of the message I show the result of "dspam
--version". The system is a Debian GNU/Linux 5.0.1 (lenny), the compiler
is gcc 4.3.2.


I had decided to start with a new database, with a initial training,
with a corpus of around 1500 spam and 1500 ham. During this training I
saw many "BROKEN result!!" messages and some crashes. Now, whenever I
try to retrain I get crashes with messages talking about "*** glibc
detected *** /home/mac/pjssilva/dspam/bin/dspam: free(): invalid
pointer: 0xffdbbf54 ***", I append the backtrace below. 

If anyone wants, I can send my current hashdriver database.

Any hints?

Paulo

===== Output of dspam --version  =====

pjssi...@kama:~$ dspam --version

DSPAM Anti-Spam Suite 3.9.0-ALPHA2 (agent/library)

Copyright (c) 2002-2009 DSPAM Project
http://dspam.sourceforge.net.

DSPAM may be copied only under the terms of the GNU General Public
License,
a copy of which can be found with the DSPAM distribution kit.

Configuration parameters:  '--prefix=/home/mac/pjssilva/dspam'
'--sysconfdir=/home/mac/pjssilva/dspam/etc'

I am attaching to the message my dspam.conf.

====== A backtrace from a crash =====

*** glibc detected *** /home/mac/pjssilva/dspam/bin/dspam: free():
invalid pointer: 0xffde5774 ***
======= Backtrace: =========
/lib/i686/cmov/libc.so.6[0xf7e7c624]
/lib/i686/cmov/libc.so.6(cfree+0x96)[0xf7e7e826]
/home/mac/pjssilva/dspam/lib/libdspam.so.7(_ds_operate
+0x385)[0xf7fb7585]
/home/mac/pjssilva/dspam/lib/libdspam.so.7(dspam_process
+0x1d9)[0xf7fb8029]
/home/mac/pjssilva/dspam/bin/dspam(retrain_message+0x157)[0x804cc97]
/home/mac/pjssilva/dspam/bin/dspam(process_message+0x9e7)[0x8050247]
/home/mac/pjssilva/dspam/bin/dspam(process_users+0x676)[0x80513b6]
/home/mac/pjssilva/dspam/bin/dspam(main+0x332)[0x8051f22]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xf7e24455]
/home/mac/pjssilva/dspam/bin/dspam[0x804a6a1]
======= Memory map: ========
08048000-08056000 r-xp 00000000 00:12
51700445                           /home/mac/pjssilva/dspam/bin/dspam
08056000-08057000 rw-p 0000d000 00:12
51700445                           /home/mac/pjssilva/dspam/bin/dspam
08b71000-08c42000 rw-p 08b71000 00:00 0
[heap]
f5500000-f5521000 rw-p f5500000 00:00 0 
f5521000-f5600000 ---p f5521000 00:00 0 
f56e2000-f56ee000 r-xp 00000000 08:05
835599                             /lib/libgcc_s.so.1
f56ee000-f56ef000 rw-p 0000b000 08:05
835599                             /lib/libgcc_s.so.1
f570b000-f690c000 rw-s 00000000 00:12
62693458                           
/home/mac/pjssilva/dspam/var/dspam/data/pjssilva/pjssilva.css
f690c000-f7b0d000 rw-s 00000000 00:12
62693458                           
/home/mac/pjssilva/dspam/var/dspam/data/pjssilva/pjssilva.css
f7b0d000-f7b16000 r-xp 00000000 08:05
835631                             /lib/i686/cmov/libnss_nis-2.7.so
f7b16000-f7b18000 rw-p 00008000 08:05
835631                             /lib/i686/cmov/libnss_nis-2.7.so
f7b18000-f7b7e000 r-xp 00000000 08:05
3852839                            /usr/lib/libgcrypt.so.11.4.4
f7b7e000-f7b80000 rw-p 00066000 08:05
3852839                            /usr/lib/libgcrypt.so.11.4.4
f7b80000-f7b94000 r-xp 00000000 08:05
49321                              /usr/lib/libz.so.1.2.3.3
f7b94000-f7b95000 rw-p 00013000 08:05
49321                              /usr/lib/libz.so.1.2.3.3
f7b95000-f7b98000 r-xp 00000000 08:05
3850923                            /usr/lib/libgpg-error.so.0.3.0
f7b98000-f7b99000 rw-p 00002000 08:05
3850923                            /usr/lib/libgpg-error.so.0.3.0
f7b99000-f7ba8000 r-xp 00000000 08:05
3852837                            /usr/lib/libtasn1.so.3.0.15
f7ba8000-f7ba9000 rw-p 0000e000 08:05
3852837                            /usr/lib/libtasn1.so.3.0.15
f7ba9000-f7bab000 r-xp 00000000 08:05
835767                             /lib/libkeyutils-1.2.so
f7bab000-f7bac000 rw-p 00001000 08:05
835767                             /lib/libkeyutils-1.2.so
f7bac000-f7bb3000 r-xp 00000000 08:05
3861598                            /usr/lib/libkrb5support.so.0.1
f7bb3000-f7bb4000 rw-p 00006000 08:05
3861598                            /usr/lib/libkrb5support.so.0.1
f7bb4000-f7bd7000 r-xp 00000000 08:05
3861595                            /usr/lib/libk5crypto.so.3.1
f7bd7000-f7bd8000 rw-p 00023000 08:05
3861595                            /usr/lib/libk5crypto.so.3.1
f7bd8000-f7c6f000 r-xp 00000000 08:05
3852842                            /usr/lib/libgnutls.so.26.4.6
f7c6f000-f7c75000 rw-p 00097000 08:05
3852842                            /usr/lib/libgnutls.so.26.4.6
f7c75000-f7c8a000 r-xp 00000000 08:05
835652                             /lib/i686/cmov/libpthread-2.7.so
f7c8a000-f7c8c000 rw-p 00014000 08:05
835652                             /lib/i686/cmov/libpthread-2.7.so
f7c8c000-f7c8e000 rw-p f7c8c000 00:00 0 
f7c8e000-f7c9e000 r-xp 00000000 08:05
835653                             /lib/i686/cmov/libresolv-2.7.so
f7c9e000-f7ca0000 rw-p 0000f000 08:05
835653                             /lib/i686/cmov/libresolv-2.7.so
f7ca0000-f7ca2000 rw-p f7ca0000 00:00 0 
f7ca2000-f7cb7000 r-xp 00000000 08:05
835626                             /lib/i686/cmov/libnsl-2.7.so
f7cb7000-f7cb9000 rw-p 00014000 08:05
835626                             /lib/i686/cmov/libnsl-2.7.so
f7cb9000-f7cbbsh: line 1:  4144
Aborted                 /home/mac/pjssilva/dspam/bin/dspam --user
pjssilva --source=error --class=spam <
1245925879.M680871P7486V0000000000000013I03172611_2.ares,S=88893:2,S


-- 
Paulo José da Silva e Silva 
Professor Associado, Dep. de Ciência da Computação
(Associate Professor, Computer Science Dept.)
Universidade de São Paulo - Brazil

e-mail: [email protected]         Web: http://www.ime.usp.br/~pjssilva

Teoria é o que não entendemos o     (Theory is something we don't)
suficiente para chamar de prática.  (understand well enough to call
practice)

## $Id: dspam.conf.in,v 1.82 2006/06/23 03:11:31 jonz Exp $
## dspam.conf -- DSPAM configuration file
##

#
# DSPAM Home: Specifies the base directory to be used for DSPAM storage
#
Home /home/mac/pjssilva/dspam/var/dspam

#
# StorageDriver: Specifies the storage driver backend (library) to use.
# You'll only need to set this if you are using dynamic storage driver plugins
# from a binary distribution. The default build statically links the storage
# driver (when only one is specified at configure time), overriding this
# setting, which only comes into play if multiple storage drivers are specified
# at configure time. When using dynamic linking, be sure to include the path 
# to the library if necessary, and some systems may use an extension other 
# than .so (e.g. OSX uses .dylib).
#
# Options include:
#
#   libmysql_drv.so     libpgsql_drv.so   libsqlite_drv.so
#   libsqlite3_drv.so   libhash_drv.so
#
# IMPORTANT: Switching storage drivers requires more than merely changing
# this option. If you do not wish to lose all of your data, you will need to
# migrate it to the new backend before making this change.
#
StorageDriver /home/mac/pjssilva/dspam/lib/libhash_drv.so

#
# Trusted Delivery Agent: Specifies the local delivery agent DSPAM should call 
# when delivering mail as a trusted user. Use %u to specify the user DSPAM is 
# processing mail for. It is generally a good idea to allow the MTA to specify 
# the pass-through arguments at run-time, but they may also be specified here.
#
# Most operating system defaults:
#TrustedDeliveryAgent "/usr/bin/procmail"       # Linux
#TrustedDeliveryAgent "/usr/bin/mail"           # Solaris
#TrustedDeliveryAgent "/usr/libexec/mail.local" # FreeBSD
#TrustedDeliveryAgent "/usr/bin/procmail"       # Cygwin
#
# Other popular configurations:
#TrustedDeliveryAgent "/usr/cyrus/bin/deliver"  # Cyrus
#TrustedDeliveryAgent "/bin/maildrop"           # Maildrop
#TrustedDeliveryAgent "/usr/local/sbin/exim -oMr spam-scanned" # Exim
#
TrustedDeliveryAgent "/usr/bin/procmail"

#
# Untrusted Delivery Agent: Specifies the local delivery agent and arguments
# DSPAM should use when delivering mail and running in untrusted user mode.
# Because DSPAM will not allow pass-through arguments to be specified to 
# untrusted users, all arguments should be specified here. Use %u to specify
# the user DSPAM is processing mail for. This configuration parameter is only 
# necessary if you plan on allowing untrusted processing.
#
#UntrustedDeliveryAgent "/usr/bin/procmail -d %u"

#
# SMTP or LMTP Delivery: Alternatively, you may wish to use SMTP or LMTP 
# delivery to deliver your message to the mail server instead of using a
# delivery agent. You will need to configure with --enable-daemon to use host 
# delivery, however you do not need to operate in daemon mode. Specify an IP 
# address or UNIX path to a domain socket below as a host.
#
# If you would like to set up DeliveryHost's on a per-domain basis, use
# the syntax: DeliveryHost.domain.com 1.2.3.4
#
#DeliveryHost        127.0.0.1
#DeliveryPort        24
#DeliveryIdent       localhost
#DeliveryProto       LMTP

#
# FallbackDomains: If you want to specify certain domains as fallback domains,
# enable this option. For example, you could create a user @domain.com, and
# if [email protected] does not resolve to a known user on the system, the user
# could default to your @domain.com user. NOTE: This also requires designating
# fallbackDomain for the domain name; 
# e.g. dspam_admin ch pref domain.com fallbackDomain on 
#
#FallbackDomains on

#
# Quarantine Agent: DSPAM's default behavior is to quarantine all mail it 
# thinks is spam. If you wish to override this behavior, you may specify
# a quarantine agent which will be called with all messages DSPAM thinks is
# spam. Use %u to specify the user DSPAM is processing mail for.
#
#QuarantineAgent        "/usr/bin/procmail -d spam"

#
# DSPAM can optionally process "plused users" (addresses in the user+detail
# form) by truncating the username just before the "+", so all internal
# processing occurs for "user", but delivery will be performed for
# "user+detail". This is only useful if the LDA can handle "plused users"
# (for example Cyrus IMAP) and when configured for LMTP delivery above
#
# NOTE: Plused detail presently only works when usernames are provided and
#       not fully qualified email address (@domain).
#
#EnablePlusedDetail     on

#
# Quarantine Mailbox: DSPAM's LMTP code can send spam mail using LMTP to a 
# "plused" mailbox (such as user+quarantine) leaving quarantine processing
# for retraining or deletion to be performed by the LDA and the mail client.
# "plused" mailboxes are supported by Cyrus IMAP and possibly other LDAs.
# The mailbox name must have the +
#
#QuarantineMailbox      +quarantine

#
# OnFail: What to do if local delivery or quarantine should fail. If set
# to "unlearn", DSPAM will unlearn the message prior to exiting with an
# un successful return code. The default option, "error" will not unlearn
# the message but return the appropriate error code. The unlearn option
# is use-ful on some systems where local delivery failures will cause the
# message to be requeued for delivery, and could result in the message
# being processed multiple times. During a very large failure, however, 
# this could cause a significant load increase.
#
OnFail error

#
# Trusted Users: Only the users specified below will be allowed to perform
# administrative functions in DSPAM such as setting the active user and
# accessing tools. All other users attempting to run DSPAM will be restricted;
# their uids will be forced to match the active username and they will not be
# able to specify delivery agent privileges or use tools.
#
Trust root
Trust mail
Trust mailnull 
Trust smmsp
Trust daemon
#Trust nobody
#Trust majordomo
Trust pjssilva
Trust renata

#
# Debugging: Enables debugging for some or all users. IMPORTANT: DSPAM must
# be compiled with debug support in order to use this option. DSPAM should
# never be running in production with debug active unless you are 
# troubleshooting problems.
#
# DebugOpt: One or more of: process, classify, spam, fp, inoculation, corpus
#   process     standard message processing
#   classify    message classification using --classify
#   spam        error correction of missed spam
#   fp          error correction of false positives
#   inoculation message inoculations (source=inoculation)
#   corpus      corpusfed messages (source=corpus)
#
#Debug *
#Debug bob bill
#
#DebugOpt process spam fp

#
# ClassAlias: Alias a particular class to spam/nonspam. This is useful if
# classifying things other than spam.
#
#ClassAliasSpam badstuff
#ClassAliasNonspam goodstuff

#
# Training Mode: The default training mode to use for all operations, when
# one has not been specified on the commandline or in the user's preferences.
# Acceptable values are: 
#     toe     Train on Error (Only)
#     teft    Train Everything (Trains on every message)
#     tum     Train Until Mature (Train only tokens without enough data)
#     notrain Do not train or store signatures (large ISP systems, post-train)
#
TrainingMode toe

#
# TestConditionalTraining: By default, dspam will retrain certain errors
# until the condition is no longer met. This usually accelerates learning.
# Some people argue that this can increase the risk of errors, however.
#
TestConditionalTraining on

#
# Features: Specify features to activate by default; can also be specified
# on the commandline. See the documentation for a list of available features.
# If _any_ features are specified on the commandline, these are ignored.
#
Feature noise
Feature whitelist

# Training Buffer: The training buffer waters down statistics during training.
# It is designed to prevent false positives, but can also dramatically reduce
# dspam's catch rate during initial training. This can be a number from 0
# (no buffering) to 10 (maximum buffering). If you are paranoid about false
# positives, you should probably enable this option.
#
#Feature tb=5

#
# Algorithms: Specify the statistical algorithms to use, overriding any
# defaults configured in the build. The options are:
#    naive       Naive-Bayesian (All Tokens)
#    graham      Graham-Bayesian ("A Plan for Spam")
#    burton      Burton-Bayesian (SpamProbe)
#    robinson    Robinson's Geometric Mean Test (Obsolete)
#    chi-square  Fisher-Robinson's Chi-Square Algorithm
#
# You may have multiple algorithms active simultaneously, but it is strongly
# recommended that you group Bayesian algorithms with other Bayesian
# algorithms, and any use of Chi-Square remain exclusive.
#
# NOTE: For standard "CRM114" Markovian weighting, use 'naive', or consider
#       using 'burton' for slightly better accuracy
#
# Don't mess with this unless you know what you're doing
#
#Algorithm chi-square
#Algorithm naive
Algorithm graham burton

#
# Tokenizer: Specify the tokenizer to use. The tokenizer is the piece
# responsible for parsing the message into individual tokens. Depending on
# how many resources you are willing to trade off vs. accuracy, you may
# choose to use a less or more detailed tokenizer:
#   word    uniGram (single word) tokenizer
#           Tokenizes message into single individual words/tokens
#           example: "free" and "viagra"
#   chain   biGram (chained tokens) tokenizer (default)
#           Single words + chains adjacent tokens together
#           example: "free" and "viagra" and "free viagra"
#   sbph    Sparse Binary Polynomial Hashing tokenizer
#           Creates sparse token patterns across sliding window of 5-tokens
#           example: "the quick * fox jumped" and "the * * fox jumped"
#   osb     Orthogonal Sparse biGram
#           Similar to SBPH, but only uses the biGrams
#           example: "the * * fox" and "the * * * jumped"
#
Tokenizer chain

#
# PValue: Specify the technique used for calculating Probability Values, 
# overriding any defaults configured in the build. These options are:
#    bcr         Bayesian Chain Rule (Graham's Technique - "A Plan for Spam")
#    robinson    Robinson's Technique (used in Chi-Square) 
#    markov      Markovian Weighted Technique (for Markovian discrimination)
#
# Unlike the "Algorithms" property, you may only have one of these defined. 
# Use of the chi-square algorithm automatically changes this to robinson.
#
# Don't mess with this unless you know what you're doing.
#
#PValue robinson
#PValue markov
PValue bcr

#
# WebStats: Enable this if you are using the CGI, which writes .stats files
WebStats off

#
# ImprobabilityDrive: Calculate odds-ratios for ham/spam, and add to
# X-DSPAM-Improbability headers
#
ImprobabilityDrive on

#
# Preferences: Specify any preferences to set by default, unless otherwise
# overridden by the user (see next section) or a default.prefs file.
# If user or default.prefs are found, the user's preferences will override any
# defaults.
#
Preference "spamAction=deliver"
Preference "signatureLocation=headers"  # 'message' or 'headers'
Preference "showFactors=on"
#Preference "spamAction=tag"
#Preference "spamSubject=SPAM"

#
# Overrides: Specifies the user preferences which may override configuration
# and commandline defaults. Any other preferences supplied by an untrusted user
# will be ignored.
#
AllowOverride trainingMode
AllowOverride spamAction spamSubject
AllowOverride statisticalSedation
AllowOverride enableBNR
AllowOverride enableWhitelist
AllowOverride signatureLocation
AllowOverride showFactors
AllowOverride optIn optOut
AllowOverride whitelistThreshold

# --- MySQL ---

#
# Storage driver settings: Specific to a particular storage driver. Uncomment
# the configuration specific to your installation, if applicable.
#
#MySQLServer            /var/lib/mysql/mysql.sock
#MySQLPort
#MySQLUser              dspam
#MySQLPass              changeme
#MySQLDb                dspam
#MySQLCompress          true

# If you are using replication for clustering, you can also specify a separate
# server to perform all writes to.
#
#MySQLWriteServer       /var/lib/mysql/mysql.sock
#MySQLWritePort         
#MySQLWriteUser         dspam
#MySQLWritePass         changeme
#MySQLWriteDb           dspam_write
#MySQLCompress          true

# If your replication isn't close to real-time, your retraining might fail if 
# the  signature isn't found. One workaround for this is to use the write
# database for all signature reads:
#
#MySQLReadSignaturesFromWriteDb on

# Use this if you have the 4.1 quote bug (see doc/mysql.txt)
#MySQLSupressQuote      on

# If you're running DSPAM in client/server (daemon) mode, uncomment the
# setting below to override the default connection cache size (the number
# of connections the server pools between all clients). The connection cache
# represents the maximum number of database connections *available* and should
# be set based on the maximum number of concurrent connections you're likely
# to have. Each connection may be used by only one thread at a time, so all
# other threads _will block_ until another connection becomes available.
#
#MySQLConnectionCache   10

# If you're using vpopmail or some other type of virtual setup and wish to
# change the table dspam uses to perform username/uid lookups, you can over-
# ride it below

#MySQLVirtualTable          dspam_virtual_uids
#MySQLVirtualUIDField       uid
#MySQLVirtualUsernameField  username

# UIDInSignature: MySQL supports the insertion of the user id into the DSPAM 
# signature. This allows you to create one single spam or fp alias 
# (pointing to some arbitrary user), and the uid in the signature will
# switch to the correct user. Result: you need only one spam alias 

#MySQLUIDInSignature    on

# --- PostgreSQL ---

#PgSQLServer            127.0.0.1
#PgSQLPort              5432
#PgSQLUser              dspam
#PgSQLPass              changeme
#PgSQLDb                dspam

# If you're running DSPAM in client/server (daemon) mode, uncomment the
# setting below to override the default connection cache size (the number
# of connections the server pools between all clients).
#
#PgSQLConnectionCache   3

# UIDInSignature: PgSQL supports the insertion of the user id into the DSPAM 
# signature. This allows you to create one single spam or fp alias 
# (pointing to some arbitrary user), and the uid in the signature will
# switch to the correct user. Result: you need only one spam alias

#PgSQLUIDInSignature    on 

# If you're using vpopmail or some other type of virtual setup and wish to
# change the table dspam uses to perform username/uid lookups, you can over-
# ride it below

#PgSQLVirtualTable          dspam_virtual_uids
#PgSQLVirtualUIDField       uid
#PgSQLVirtualUsernameField  username

# --- SQLite ---

#SQLitePragma   "synchronous = OFF"

# --- Hash ---

#
# HashRecMax: Default number of records to create in the initial segment when
# building hash files. 100,000 yields files 1.6MB in size, but can fill up
# fast, so be sure to increase this (to a million or more) if you're not using
# autoextend.
#
# NOTE: If you're using a heavy-weight tokenizer, such as SBPH, you should be
#       looking for settings in the 'millions' of records.
#
# Primes List:
#  53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 196613,
#  393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843, 50331653, 
#  100663319, 201326611, 402653189, 805306457, 1610612741, 3221225473, 
#  4294967291
#
HashRecMax              786433

#
# HashAutoExtend: Autoextend hash databases when they fill up. This allows
# them to continue to train by adding extents (extensions) to the file. There 
# will be a small delay during the growth process, as everything needs to be 
# closed and remapped. 
#
HashAutoExtend          on  

#
# HashMaxExtents: The maximum number of extents that may be created in a single
# hash file. Set this to zero for unlimited
#
HashMaxExtents          0

#
# HashExtentSize: The initial record size for newly created extents. Creating 
# this too small could result in many extents being created. Creating this too 
# large could result in excessive disk space usage. Typically, a value close 
# to half of the HashRecMax size is good.
#
HashExtentSize          393241

#
# HashPctIncrease: Increase the next extent size by n% from the size of the
# last extent. This is useful in accommodating systems where the default 
# HashExtentSize can be too small for certain high-volume users, and can also
# help keep seeks nice and speedy and/or prevent too many unnecessary extents 
# from being created when using a low HashMaxSeek. The default behavior, when 
# HashPctIncrease is not used, is to always use # HashExtentSize with no 
# increase.
#
HashPctIncrease 10

#
# HashMaxSeek: The maximum number of record seeks when inserting a new record
# before failing or adding a new extent. This ultimately translates into the
# max # of acceptable seeks per segment. Setting this too high will exhaustively
# scan each segment and hurt performance. Typically, a low value is acceptable
# as even older extents will continue to fill as training progresses.
#
HashMaxSeek             10

#
# HashConcurrentUser: If you are using a single, stateful hash database in
# daemon mode, specifying a concurrent user below will cause the user to be 
# permanently mapped into memory and shared via rwlocks. This is very fast and
# very cool if you are running a "userless" relay appliance.
#
#HashConcurrentUser     user

#
# HashConnectionCache: If running in daemon mode, this is the max # of
# concurrent connections that will be supported. NOTE: If you are using
# HashConcurrentUser, this option is ignored, as all connections are read-
# write locked instead of mutex locked.
#
HashConnectionCache     10

# -- LDAP --

#
# LDAP: Perform various LDAP functions depending on LDAPMode variable.
# Presently, the only mode supported is 'verify', which will verify the 
# existence of an unknown user in LDAP prior to creating them as a new user in 
# the system.  This is useful on some systems acting as gateway machines.
#
#LDAPMode       verify
#LDAPHost       ldaphost.mydomain.com
#LDAPFilter     "(mail=%u)"
#LDAPBase       ou=people,dc=domain,dc=com

# -- Profiles --

#
# You can specify multiple storage profiles, and specify the server to
# use on the commandline with --profile. For example:
#
#Profile DECAlpha
#MySQLServer.DECAlpha   10.0.0.1
#MySQLPort.DECAlpha     3306
#MySQLUser.DECAlpha     dspam
#MySQLPass.DECAlpha     changeme
#MySQLDb.DECAlpha       dspam
#MySQLCompress.DECAlpha true
#
#Profile Sun420R
#MySQLServer.Sun420R    10.0.0.2
#MySQLPort.Sun420R      3306
#MySQLUser.Sun420R      dspam
#MySQLPass.Sun420R      changeme
#MySQLDb.Sun420R        dspam
#MySQLCompress.Sun420R  false
#
#DefaultProfile DECAlpha

#
# If you're using storage profiles, you can set failovers for each profile.
# Of course, if you'll be failing over to another database, that database
# must have the same information as the first. If you're using a global
# database with no training, this should be relatively simple. If you're
# configuring per-user data, however, you'll need to set up some type of
# replication between databases.
#
#Failover.DECAlpha      SUN420R
#Failover.Sun420R       DECAlpha

# If the storage fails, the agent will follow each profile's failover up to
# a maximum number of failover attempts. This should be set to a maximum of
# the number of profiles you have, otherwise the agent could loop and try
# the same profile multiple times (unless this is your desired behavior).
#
#FailoverAttempts       1

#
# Ignored headers: If DSPAM is behind other tools which may add a header to
# incoming emails, it may be beneficial to ignore these headers - especially
# if they are coming from another spam filter. If you are _not_ using one of
# these tools, however, leaving the appropriate headers commented out will
# allow DSPAM to use them as telltale signs of forged email.
#
#IgnoreHeader X-Spam-Status
#IgnoreHeader X-Spam-Scanned
#IgnoreHeader X-Virus-Scanner-Result
IgnoreHeader Date

#
# Lookup: Perform lookups on streamlined blackhole list servers (see
# http://www.nuclearelephant.com/projects/sbl/). The streamlined blacklist
# server is machine-automated, unsupervised blacklisting system designed to
# provide real-time and highly accurate blacklisting based on network spread.
# When performing a lookup, DSPAM will automatically learn the inbound message 
# as spam if the source IP is listed. Until an official public RABL server is 
# available, this feature is only useful if you are running your own 
# streamlined blackhole list server for internal reporting among multiple mail 
# servers. Provide the name of the lookup zone below to use.
#
# This function performs standard reverse-octet.domain lookups, and while it
# will function with many RBLs, it's strongly discouraged to use those
# maintained by humans as they're often inaccurate and could hurt filter
# learning and accuracy.
#
#Lookup "sbl.yourdomain.com"

#
# RBLInoculate: If you want to inoculate the user from RBL'd messages it would
# have otherwise missed, set this to on.
#
#RBLInoculate off

#
# Notifications: Enable the sending of notification emails to users (first
# message, quarantine full, etc.)
#
Notifications   off

#
# Purge configuration: Set dspam_clean purge default options, if not otherwise
# specified on the commandline
#
PurgeSignatures 14          # Stale signatures
PurgeNeutral    90          # Tokens with neutralish probabilities
PurgeUnused     90          # Unused tokens
PurgeHapaxes    30          # Tokens with less than 5 hits (hapaxes)
PurgeHits1S     15          # Tokens with only 1 spam hit
PurgeHits1I     15          # Tokens with only 1 innocent hit

#
# Purge configuration for SQL-based installations using purge.sql
#
#PurgeSignature off # Specified in purge.sql
#PurgeNeutral   90
#PurgeUnused    off # Specified in purge.sql
#PurgeHapaxes   off # Specified in purge.sql
#PurgeHits1S    off # Specified in purge.sql
#PurgeHits1I    off # Specified in purge.sql

#
# Local Mail Exchangers: Used for source address tracking, tells DSPAM which
# mail exchangers are local and therefore should be ignored in the Received:
# header when tracking the source of an email. Note: you should use the address
# of the host as appears between brackets [ ] in the Received header.
#
LocalMX 127.0.0.1

#
# Logging: Disabling logging for users will make usage graphs unavailable to
# them. Disabling system logging will make admin graphs unavailable.
#
SystemLog off
UserLog   off

#
# TrainPristine: for systems where the original message remains server side 
# and can therefore be presented in pristine format for retraining. This option
# will cause DSPAM to cease all writing of signatures and DSPAM headers to the 
# message, and deliver the message in as pristine format as possible. This mode
# REQUIRES that the original message in its pristine format (as of delivery) 
# be presented for retraining, as in the case of webmail, imap, or other 
# applications where the message is actually kept server-side during reading, 
# and is preserved. DO NOT use this switch unless the original message can be 
# presented for retraining with the ORIGINAL HEADERS and NO MODIFICATIONS.
#
# NOTE: You can't use this setting with dspam_trian; if you're going to use it,
#       wait until after you train any corpora.
#
#TrainPristine on

#
# Opt: in or out; determines DSPAM's default filtering behavior. If this value
# is set to in, users must opt-in to filtering by dropping a .dspam file in
# /var/dspam/opt-in/user.dspam (or if you have homedirs configured, a .dspam
# folder in their home directory).  The default is opt-out, which means all 
# users will be filtered unless a .nodspam file is dropped in 
# /var/dspam/opt-out/user.nodspam
#
Opt out

#
# TrackSources: specify which (if any) source addresses to track and report
# them to syslog (mail.info). This is useful if you're running a firewall or
# blacklist and would like to use this information. Spam reporting also drops
# RABL blacklist files (see http://www.nuclearelephant.com/projects/rabl/). 
#
#TrackSources spam nonspam

#
# ParseToHeaders: In lieu of setting up individual aliases for each user,
# DSPAM can be configured to automatically parse the To: address for spam and
# false positive forwards. From there, it can be configured to either set the
# DSPAM user based on the username specified in the header and/or change the
# training class and source accordingly. The options below can be used to 
# customize most common types of header parsing behavior to avoid the need for
# multiple aliases, or if using LMTP, aliases entirely..
#
# ParseToHeader: Parse the To: headers of an incoming message. This must be
#                set to 'on' to use either of the following features.
# 
# ChangeModeOnParse: Automatically change the class (to spam or innocent)
#   depending on whether spam- or notspam- was specified, and change the source
#   to 'error'. This is convenient if you're not using aliases at all, but
#   are delivering via LMTP.
#
# ChangeUserOnParse: Automatically change the username to match that specified
#   in the To: header. For example, [email protected] will set the username
#   to bob, ignoring any --user passed in. This may not always be desirable if
#   you are using virtual email addresses as usernames. Options:
#     on or user        take the portion before the @ sign only
#     full              take everything after the initial {spam,notspam}-.
#
#ParseToHeaders on
#ChangeModeOnParse on
#ChangeUserOnParse on

#
# Broken MTA Options: Some MTAs don't support the proper functionality
# necessary. In these cases you can activate certain features in DSPAM to
# compensate. 'returnCodes' causes DSPAM to return an exit code of 99 if
# the message is spam, 0 if not, or a negative code if an error has occured.
# Specifying 'case' causes DSPAM to force the input usernames to lowercase.
# Spceifying 'lineStripping' causes DSPAM to strip ^M's from messages passed
# in.
#
#Broken returnCodes
#Broken case
#Broken lineStripping

#
# MaxMessageSize: You may specify a maximum message size for DSPAM to process.
# If the message is larger than the maximum size, it will be delivered 
# without processing. Value is in bytes.
#
#MaxMessageSize 4194304

#
# Virus Checking: If you are running clamd, DSPAM can perform stream-based
# virus checking using TCP. Uncomment the values below to enable virus
# checking. 
#
# ClamAVResponse: reject (reject or drop the message with a permanent failure)
#                 accept (accept the message and quietly drop the message)
#                 spam   (treat as spam and quarantine/tag/whatever)
#
#ClamAVPort     3310
#ClamAVHost     127.0.0.1
#ClamAVResponse accept

# -- CLIENT / SERVER --

#
# Daemonized Server: If you are running DSPAM as a daemonized server using
# --daemon, the following parameters will override the default. Use the
# ServerPass option to set up accounts for each client machine. The DSPAM
# server will process and deliver the message based on the parameters 
# specified. If you want the client machine to perform delivery, use
# the --stdout option in conjunction with a local setup. 
#
#ServerPort             24
#ServerQueueSize        32
#ServerPID              /var/run/dspam.pid

#
# ServerMode specifies the type of LMTP server to start. This can be one of:
#     dspam: DSPAM-proprietary DLMTP server, for communicating with dspamc
#  standard: Standard LMTP server, for communicating with Postfix or other MTA
#      auto: Speak both DLMTP and LMTP; auto-detect by ServerPass.IDENT
#
#ServerMode dspam

# If supporting DLMTP (dspam) mode, dspam clients will require authentication 
# as they will be passing in parameters. The idents below will be used to
# determine which clients will be speaking DLMTP, so if you will be using
# both LMTP and DLMTP from the same host, be sure to use something other
# than the server's hostname below (which will be sent by the MTA during a 
# standard LMTP LHLO).
# 
#ServerPass.Relay1      "secret"
#ServerPass.Relay2      "password"

# If supporting standard LMTP mode, server parameters will need to be specified
# here, as they will not be passed in by the mail server. The ServerIdent
# specifies the 250 response code ident sent back to connecting clients and
# should be set to the hostname of your server, or an alias.
#
# NOTE: If you specify --user in ServerParameters, the RCPT TO will be
#       used only for delivery, and not set as the active user for processing.
#
#ServerParameters       "--deliver=innocent -d %u"
#ServerIdent            "localhost.localdomain"

# If you wish to use a local domain socket instead of a TCP socket, uncomment
# the following. It is strongly recommended you use local domain sockets if
# you are running the client and server on the same machine, as it eliminates
# much of the bandwidth overhead.
#
#ServerDomainSocketPath  "/tmp/dspam.sock"

#
# Client Mode: If you are running DSPAM in client/server mode, uncomment and
# set these variables. A ClientHost beginning with a / will be treated as
# a domain socket.
#
#ClientHost     /tmp/dspam.sock
#ClientIdent    "sec...@relay1"
#
#ClientHost     127.0.0.1
#ClientPort     24
#ClientIdent    "sec...@relay1"

# RABLQueue: Touch files in the RABL queue
# If you are a reporting streamlined blackhole list participant, you can
# touch ip addresses within the directory the rabl_client process is watching.
#
#RABLQueue      /var/spool/rabl

# DataSource: If you are using any type of data source that does not include
# email-like headers (such as documents), uncomment the line below. This
# will cause the entire input to be treated like a message "body"
#
#DataSource      document

# ProcessorWordFrequency: By default, words are only counted once per message.
# If you are classifying large documents, however, you may wish to count once
# per occurrence instead.
#
#ProcessorWordFrequency  occurrence

# ProcessorURLContext: By default, a URL context is generated for URLs, which
# records their tokens as separate from words found in documents. To use
# URL tokens in the same context as words, turn this feature off. 
#
ProcessorURLContext on

# ProcessorBias: Bias causes the filter to lean more toward 'innocent', and
# usually greatly reduces false positives. It is the default behavior of
# most Bayesian filters (including dspam). 
#
# NOTE: You probably DONT want this if you're using Markovian Weighting, unless
# you are paranoid about false positives.
#
ProcessorBias on

## EOF

------------------------------------------------------------------------------

_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

[Dspam-user] Crashes in 3.9.0-ALPHA2 when retrainning

Reply via email to