I haven't really dealt with the utilities much.  When you say drop the
old data, do you mean physically go into the db delete data on all
dspam tables or use a utility?  If use a utility, which one?


On Thu, Jan 28, 2010 at 6:14 PM, Stevan Bajić <[email protected]> wrote:
> On Thu, 28 Jan 2010 11:55:48 -0500
> Roman Gelfand <[email protected]> wrote:
>
>> #
>> # Training Mode: The default training mode to use for all operations, when
>> # one has not been specified on the commandline or in the user's preferences.
>> # Acceptable values are:
>> #     toe     Train on Error (Only)
>> #     teft    Train Everything (Trains on every message)
>> #     tum     Train Until Mature (Train only tokens without enough data)
>> #     notrain Do not train or store signatures (large ISP systems, 
>> post-train)
>> #
>> TrainingMode teft
>>
> Please switch that to "toe"! Using "teft" is old school and one part of your 
> problem.
>
>
>> #
>> # Features: Specify features to activate by default; can also be specified
>> # on the commandline. See the documentation for a list of available features.
>> # If _any_ features are specified on the commandline, these are ignored.
>> #
>> #Feature noise
>> Feature whitelist
>>
> Enable "noise". It's a good thing that will help you.
>
>
>> # Training Buffer: The training buffer waters down statistics during 
>> training.
>> # It is designed to prevent false positives, but can also dramatically reduce
>> # dspam's catch rate during initial training. This can be a number from 0
>> # (no buffering) to 10 (maximum buffering). If you are paranoid about false
>> # positives, you should probably enable this option.
>> #
>> #Feature tb=5
>>
> Depending on the data you already have learned, it could be beneficial to 
> enable this option.
>
>
>> #
>> # Tokenizer: Specify the tokenizer to use. The tokenizer is the piece
>> # responsible for parsing the message into individual tokens. Depending on
>> # how many resources you are willing to trade off vs. accuracy, you may
>> # choose to use a less or more detailed tokenizer:
>> #   word    uniGram (single word) tokenizer
>> #           Tokenizes message into single individual words/tokens
>> #           example: "free" and "viagra"
>> #   chain   biGram (chained tokens) tokenizer (default)
>> #           Single words + chains adjacent tokens together
>> #           example: "free" and "viagra" and "free viagra"
>> #   sbph    Sparse Binary Polynomial Hashing tokenizer
>> #           Creates sparse token patterns across sliding window of 5-tokens
>> #           example: "the quick * fox jumped" and "the * * fox jumped"
>> #   osb     Orthogonal Sparse biGram tokenizer
>> #           Similar to SBPH, but only uses the biGrams
>> #           example: "the * * fox" and "the * * * jumped"
>> #
>> Tokenizer chain
>>
> That is the main part of your problem. It is no surprise that you retrain and 
> retrain and retrain and still don't get the data to flip the state. Please 
> use "osb". It's way better for your situation.
>
>
>> #
>> # Preferences: Specify any preferences to set by default, unless otherwise
>> # overridden by the user (see next section) or a default.prefs file.
>> # If user or default.prefs are found, the user's preferences will override 
>> any
>> # defaults.
>> #
>> Preference "trainingMode=TEFT"                # { TOE | TUM | TEFT | NOTRAIN 
>> } -> default:teft
>>
> Set this to "TOE"
>
> ------------------------------------------------------------------------------
> The Planet: dedicated and managed hosting, cloud storage, colocation
> Stay online with enterprise data centers and the best network in the business
> Choose flexible plans and management services without long-term contracts
> Personal 24x7 support from experience hosting pros just a phone call away.
> http://p.sf.net/sfu/theplanet-com
> _______________________________________________
> Dspam-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspam-user
>

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to