Stevan Bajić wrote: > On Fri, 18 Dec 2009 00:58:04 +0100 > Frantisek Hanzlik<[email protected]> wrote: > >> Stevan Bajić wrote: >>> On Thu, 17 Dec 2009 18:28:58 +0100 >>> Frantisek Hanzlik<[email protected]> wrote: >>> >>>> I want upgrade several DSPAM installation, all of them use hash driver, >>>> to 3.9.0. Is there any suggestion? Is possible use old databases, or >>>> it is not recommended? >>>> >>> You can use old databases without issues. >>> >>> >>>> Maybe, because of different (better) charset decoding (important for >>>> me, as in Czech are used utf8, 8859-2, cp1250,.. codings) and html >>>> parsing in 3.9.0, there is better throw away old databases and create >>>> new, probably with corpus training utilizing? >>>> >>> Since you are using the Hash driver any training you would want to do >> > can only be on a per user basis since the Hash driver does not have >> > DSPAM-groups support. >> >> Hello Stevan, >> > Ahoi Frantisek, > >> how I have understand this (Hash driver does not have DSPAM-groups support) ? >> > Semi correct. Everything that involves reading more then one database/css > does not work with th Hash driver.
Aha. Then with hash driver isn't probably possible use merged and classification groups and maybe inoculation group, but shared should be fine. > >> README says, that hash driver not support merged groups, but other are >> probably OK, yes? >> > I need to look deeper into the code but as far as I remember anything that > involves reading more then just one database/css file does not work. > > >> In my configurations I mailnly use "shared,managed" or >> "shared" groups and it work fine. >> > Shared is just using ONE single css file for a bunch of users. That should > work with the Hash driver. > > >> Or isn't possible use dspan-train script for DSPAM pre-training? >> > Yes, yes. It is possible to use the dspam_train script to pretrain the Hash > driver. > > >> And, in dspam sources is scripts/train.pl script, for which purposes is it? >> > That is an older version of dspam_train that is far, far, far behind the > current > dspam_train in terms of functionality and in terms of used DSPAM functions > (for > example it does not handle blocklist, blacklist, etc). You can use that script if > you want or use dspam_train or make your own training script. I for example use my > own made script that is using TONE (Train on Error or Near Error) with additional > features like asymetric treshold/thickness for the spam/ham training, double side > training (this is essencial for the Hyperspace classifier in CRM114 and I find that > idea good so I implemented it into my training script as well), etc... Most of the > ideas about how to train the correct way came up after using CRM114/OSBF-Lua for > many years. My script is as well by factors faster then the original dspam_train > since I don't use signature based training (so I don't need to purge signatures after > a long training run) and other small things that I need because I use the script to > feed fresh data to my DSPAM instance that I have captured on my SPAM > honeypot. > I needed that additional functionallity because all training is done automatic without > my own intervention and I need the script to be rock solid and to continue running even > if some mails are producing erros in DSPAM while doing the training. > Currently I have the following options: > ---------------------------------------------------------------- > theia spam-stuff # ./dspam_train_tone_v5 --help > ERROR: spam corpus must be path to maildir directory or MBOX file. > > Usage: ./dspam_train_tone_v5 > [[username]|[--user username]] User name to use for training > [--client] To run in client mode > [--random] Randomly process corpi > [--refute] To unlearn errors from opposite class > [--subject] To show subject from error/unlearn/TONE > [--max-retrain max_retrain] Maximum relearns per error/TONE > [--spam-threshold threshold] TONE Spam threshold > [--ham-threshold threshold] TONE Ham threshold > [--overleap count] Overleap certain count of messages > [--stop-after count] Stop after processed certain count of > messages > [[-i index]|[spam_dir] [nonspam_dir]] > > theia spam-stuff # > ---------------------------------------------------------------- Eh, I must admit, I not well understand all of these finest theory. >>> I would say that you should keep the old databases and run daily the >> > clean process (cssclean/csscompress) to purge old tokens from the >> database. >> > Soon or later the old unused tokens will vanish from the database and >> you >> > will only have new tokens. >>> >>> As soon as you use 3.9.0 your users will benefit from the different (better) >> > charset decoding and html parsing. Purging/removing the database will >> not >> > affect that capability in any negative nor in any positive way. >>> >> >> Well, I understand. I wanted try pre-train dspam from prepared spam and ham >> corpus, as I expect slightly better accuracy in addition to start with >> 3.9.0-fine CSS, especially on lazy users, which not train dspam fairly. >> > Then you should definatley use TOE or TUM but NOT TEFT. I mean in production. > For training you can use whatever you think is best for you. Yes, after some training I commonly switch to TOE. README suggest it too, when there are doing databases cleanings. >> Sorry for my terrible english. >> > Není žádný problém Yes, I know, not for You, but for me yes. But at least so. When we touch it - not know when You registerd it, I sent before yesterday Czech webui translation via bugtracker system. What is untranslated (beside some shorcuts in nav_admin_user.html, which is probably better leave it as is) is button "Tweak -1" in nav_performance.html. Can You please briefly explain its function? Thanks, Franta ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Dspam-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspam-user
