On Thu, 28 Jan 2010 18:26:15 -0500 Roman Gelfand <[email protected]> wrote:
> I haven't really dealt with the utilities much. When you say drop the > old data, do you mean physically go into the db delete data on all > dspam tables or use a utility? If use a utility, which one? > TRUNCATE `dspam_signature_data`; TRUNCATE `dspam_stats`; TRUNCATE `dspam_token_data`; > > On Thu, Jan 28, 2010 at 6:14 PM, Stevan Bajić <[email protected]> wrote: > > On Thu, 28 Jan 2010 11:55:48 -0500 > > Roman Gelfand <[email protected]> wrote: > > > >> # > >> # Training Mode: The default training mode to use for all operations, when > >> # one has not been specified on the commandline or in the user's > >> preferences. > >> # Acceptable values are: > >> # toe Train on Error (Only) > >> # teft Train Everything (Trains on every message) > >> # tum Train Until Mature (Train only tokens without enough data) > >> # notrain Do not train or store signatures (large ISP systems, > >> post-train) > >> # > >> TrainingMode teft > >> > > Please switch that to "toe"! Using "teft" is old school and one part of > > your problem. > > > > > >> # > >> # Features: Specify features to activate by default; can also be specified > >> # on the commandline. See the documentation for a list of available > >> features. > >> # If _any_ features are specified on the commandline, these are ignored. > >> # > >> #Feature noise > >> Feature whitelist > >> > > Enable "noise". It's a good thing that will help you. > > > > > >> # Training Buffer: The training buffer waters down statistics during > >> training. > >> # It is designed to prevent false positives, but can also dramatically > >> reduce > >> # dspam's catch rate during initial training. This can be a number from 0 > >> # (no buffering) to 10 (maximum buffering). If you are paranoid about false > >> # positives, you should probably enable this option. > >> # > >> #Feature tb=5 > >> > > Depending on the data you already have learned, it could be beneficial to > > enable this option. > > > > > >> # > >> # Tokenizer: Specify the tokenizer to use. The tokenizer is the piece > >> # responsible for parsing the message into individual tokens. Depending on > >> # how many resources you are willing to trade off vs. accuracy, you may > >> # choose to use a less or more detailed tokenizer: > >> # word uniGram (single word) tokenizer > >> # Tokenizes message into single individual words/tokens > >> # example: "free" and "viagra" > >> # chain biGram (chained tokens) tokenizer (default) > >> # Single words + chains adjacent tokens together > >> # example: "free" and "viagra" and "free viagra" > >> # sbph Sparse Binary Polynomial Hashing tokenizer > >> # Creates sparse token patterns across sliding window of 5-tokens > >> # example: "the quick * fox jumped" and "the * * fox jumped" > >> # osb Orthogonal Sparse biGram tokenizer > >> # Similar to SBPH, but only uses the biGrams > >> # example: "the * * fox" and "the * * * jumped" > >> # > >> Tokenizer chain > >> > > That is the main part of your problem. It is no surprise that you retrain > > and retrain and retrain and still don't get the data to flip the state. > > Please use "osb". It's way better for your situation. > > > > > >> # > >> # Preferences: Specify any preferences to set by default, unless otherwise > >> # overridden by the user (see next section) or a default.prefs file. > >> # If user or default.prefs are found, the user's preferences will override > >> any > >> # defaults. > >> # > >> Preference "trainingMode=TEFT" # { TOE | TUM | TEFT | > >> NOTRAIN } -> default:teft > >> > > Set this to "TOE" > > > > ------------------------------------------------------------------------------ > > The Planet: dedicated and managed hosting, cloud storage, colocation > > Stay online with enterprise data centers and the best network in the > > business > > Choose flexible plans and management services without long-term contracts > > Personal 24x7 support from experience hosting pros just a phone call away. > > http://p.sf.net/sfu/theplanet-com > > _______________________________________________ > > Dspam-user mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/dspam-user > > > ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
