Re: [Dspam-user] Training

Stevan Bajić Thu, 28 Jan 2010 15:25:30 -0800

On Thu, 28 Jan 2010 11:55:48 -0500
Roman Gelfand <[email protected]> wrote:


> #
> # Training Mode: The default training mode to use for all operations, when
> # one has not been specified on the commandline or in the user's preferences.
> # Acceptable values are:
> #     toe     Train on Error (Only)
> #     teft    Train Everything (Trains on every message)
> #     tum     Train Until Mature (Train only tokens without enough data)
> #     notrain Do not train or store signatures (large ISP systems, post-train)
> #
> TrainingMode teft
> 
Please switch that to "toe"! Using "teft" is old school and one part of your 
problem.


> #
> # Features: Specify features to activate by default; can also be specified
> # on the commandline. See the documentation for a list of available features.
> # If _any_ features are specified on the commandline, these are ignored.
> #
> #Feature noise
> Feature whitelist
> 
Enable "noise". It's a good thing that will help you.


> # Training Buffer: The training buffer waters down statistics during training.
> # It is designed to prevent false positives, but can also dramatically reduce
> # dspam's catch rate during initial training. This can be a number from 0
> # (no buffering) to 10 (maximum buffering). If you are paranoid about false
> # positives, you should probably enable this option.
> #
> #Feature tb=5
> 
Depending on the data you already have learned, it could be beneficial to 
enable this option.


> #
> # Tokenizer: Specify the tokenizer to use. The tokenizer is the piece
> # responsible for parsing the message into individual tokens. Depending on
> # how many resources you are willing to trade off vs. accuracy, you may
> # choose to use a less or more detailed tokenizer:
> #   word    uniGram (single word) tokenizer
> #           Tokenizes message into single individual words/tokens
> #           example: "free" and "viagra"
> #   chain   biGram (chained tokens) tokenizer (default)
> #           Single words + chains adjacent tokens together
> #           example: "free" and "viagra" and "free viagra"
> #   sbph    Sparse Binary Polynomial Hashing tokenizer
> #           Creates sparse token patterns across sliding window of 5-tokens
> #           example: "the quick * fox jumped" and "the * * fox jumped"
> #   osb     Orthogonal Sparse biGram tokenizer
> #           Similar to SBPH, but only uses the biGrams
> #           example: "the * * fox" and "the * * * jumped"
> #
> Tokenizer chain
> 
That is the main part of your problem. It is no surprise that you retrain and 
retrain and retrain and still don't get the data to flip the state. Please use 
"osb". It's way better for your situation.


> #
> # Preferences: Specify any preferences to set by default, unless otherwise
> # overridden by the user (see next section) or a default.prefs file.
> # If user or default.prefs are found, the user's preferences will override any
> # defaults.
> #
> Preference "trainingMode=TEFT"                # { TOE | TUM | TEFT | NOTRAIN 
> } -> default:teft
>
Set this to "TOE"

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] Training

Reply via email to