On Thu, 28 Jan 2010 18:26:15 -0500
Roman Gelfand <[email protected]> wrote:

> I haven't really dealt with the utilities much.  When you say drop the
> old data, do you mean physically go into the db delete data on all
> dspam tables or use a utility?  If use a utility, which one?
> 
TRUNCATE `dspam_signature_data`;
TRUNCATE `dspam_stats`;
TRUNCATE `dspam_token_data`;


> 
> On Thu, Jan 28, 2010 at 6:14 PM, Stevan Bajić <[email protected]> wrote:
> > On Thu, 28 Jan 2010 11:55:48 -0500
> > Roman Gelfand <[email protected]> wrote:
> >
> >> #
> >> # Training Mode: The default training mode to use for all operations, when
> >> # one has not been specified on the commandline or in the user's 
> >> preferences.
> >> # Acceptable values are:
> >> #     toe     Train on Error (Only)
> >> #     teft    Train Everything (Trains on every message)
> >> #     tum     Train Until Mature (Train only tokens without enough data)
> >> #     notrain Do not train or store signatures (large ISP systems, 
> >> post-train)
> >> #
> >> TrainingMode teft
> >>
> > Please switch that to "toe"! Using "teft" is old school and one part of 
> > your problem.
> >
> >
> >> #
> >> # Features: Specify features to activate by default; can also be specified
> >> # on the commandline. See the documentation for a list of available 
> >> features.
> >> # If _any_ features are specified on the commandline, these are ignored.
> >> #
> >> #Feature noise
> >> Feature whitelist
> >>
> > Enable "noise". It's a good thing that will help you.
> >
> >
> >> # Training Buffer: The training buffer waters down statistics during 
> >> training.
> >> # It is designed to prevent false positives, but can also dramatically 
> >> reduce
> >> # dspam's catch rate during initial training. This can be a number from 0
> >> # (no buffering) to 10 (maximum buffering). If you are paranoid about false
> >> # positives, you should probably enable this option.
> >> #
> >> #Feature tb=5
> >>
> > Depending on the data you already have learned, it could be beneficial to 
> > enable this option.
> >
> >
> >> #
> >> # Tokenizer: Specify the tokenizer to use. The tokenizer is the piece
> >> # responsible for parsing the message into individual tokens. Depending on
> >> # how many resources you are willing to trade off vs. accuracy, you may
> >> # choose to use a less or more detailed tokenizer:
> >> #   word    uniGram (single word) tokenizer
> >> #           Tokenizes message into single individual words/tokens
> >> #           example: "free" and "viagra"
> >> #   chain   biGram (chained tokens) tokenizer (default)
> >> #           Single words + chains adjacent tokens together
> >> #           example: "free" and "viagra" and "free viagra"
> >> #   sbph    Sparse Binary Polynomial Hashing tokenizer
> >> #           Creates sparse token patterns across sliding window of 5-tokens
> >> #           example: "the quick * fox jumped" and "the * * fox jumped"
> >> #   osb     Orthogonal Sparse biGram tokenizer
> >> #           Similar to SBPH, but only uses the biGrams
> >> #           example: "the * * fox" and "the * * * jumped"
> >> #
> >> Tokenizer chain
> >>
> > That is the main part of your problem. It is no surprise that you retrain 
> > and retrain and retrain and still don't get the data to flip the state. 
> > Please use "osb". It's way better for your situation.
> >
> >
> >> #
> >> # Preferences: Specify any preferences to set by default, unless otherwise
> >> # overridden by the user (see next section) or a default.prefs file.
> >> # If user or default.prefs are found, the user's preferences will override 
> >> any
> >> # defaults.
> >> #
> >> Preference "trainingMode=TEFT"                # { TOE | TUM | TEFT | 
> >> NOTRAIN } -> default:teft
> >>
> > Set this to "TOE"
> >
> > ------------------------------------------------------------------------------
> > The Planet: dedicated and managed hosting, cloud storage, colocation
> > Stay online with enterprise data centers and the best network in the 
> > business
> > Choose flexible plans and management services without long-term contracts
> > Personal 24x7 support from experience hosting pros just a phone call away.
> > http://p.sf.net/sfu/theplanet-com
> > _______________________________________________
> > Dspam-user mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/dspam-user
> >
> 

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to