I haven't really dealt with the utilities much. When you say drop the old data, do you mean physically go into the db delete data on all dspam tables or use a utility? If use a utility, which one?
On Thu, Jan 28, 2010 at 6:14 PM, Stevan Bajić <[email protected]> wrote: > On Thu, 28 Jan 2010 11:55:48 -0500 > Roman Gelfand <[email protected]> wrote: > >> # >> # Training Mode: The default training mode to use for all operations, when >> # one has not been specified on the commandline or in the user's preferences. >> # Acceptable values are: >> # toe Train on Error (Only) >> # teft Train Everything (Trains on every message) >> # tum Train Until Mature (Train only tokens without enough data) >> # notrain Do not train or store signatures (large ISP systems, >> post-train) >> # >> TrainingMode teft >> > Please switch that to "toe"! Using "teft" is old school and one part of your > problem. > > >> # >> # Features: Specify features to activate by default; can also be specified >> # on the commandline. See the documentation for a list of available features. >> # If _any_ features are specified on the commandline, these are ignored. >> # >> #Feature noise >> Feature whitelist >> > Enable "noise". It's a good thing that will help you. > > >> # Training Buffer: The training buffer waters down statistics during >> training. >> # It is designed to prevent false positives, but can also dramatically reduce >> # dspam's catch rate during initial training. This can be a number from 0 >> # (no buffering) to 10 (maximum buffering). If you are paranoid about false >> # positives, you should probably enable this option. >> # >> #Feature tb=5 >> > Depending on the data you already have learned, it could be beneficial to > enable this option. > > >> # >> # Tokenizer: Specify the tokenizer to use. The tokenizer is the piece >> # responsible for parsing the message into individual tokens. Depending on >> # how many resources you are willing to trade off vs. accuracy, you may >> # choose to use a less or more detailed tokenizer: >> # word uniGram (single word) tokenizer >> # Tokenizes message into single individual words/tokens >> # example: "free" and "viagra" >> # chain biGram (chained tokens) tokenizer (default) >> # Single words + chains adjacent tokens together >> # example: "free" and "viagra" and "free viagra" >> # sbph Sparse Binary Polynomial Hashing tokenizer >> # Creates sparse token patterns across sliding window of 5-tokens >> # example: "the quick * fox jumped" and "the * * fox jumped" >> # osb Orthogonal Sparse biGram tokenizer >> # Similar to SBPH, but only uses the biGrams >> # example: "the * * fox" and "the * * * jumped" >> # >> Tokenizer chain >> > That is the main part of your problem. It is no surprise that you retrain and > retrain and retrain and still don't get the data to flip the state. Please > use "osb". It's way better for your situation. > > >> # >> # Preferences: Specify any preferences to set by default, unless otherwise >> # overridden by the user (see next section) or a default.prefs file. >> # If user or default.prefs are found, the user's preferences will override >> any >> # defaults. >> # >> Preference "trainingMode=TEFT" # { TOE | TUM | TEFT | NOTRAIN >> } -> default:teft >> > Set this to "TOE" > > ------------------------------------------------------------------------------ > The Planet: dedicated and managed hosting, cloud storage, colocation > Stay online with enterprise data centers and the best network in the business > Choose flexible plans and management services without long-term contracts > Personal 24x7 support from experience hosting pros just a phone call away. > http://p.sf.net/sfu/theplanet-com > _______________________________________________ > Dspam-user mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspam-user > ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
