On Mon, 2 Aug 2010 01:26:10 +0200 (CEST), "Imposit.com - webmaster" <webmas...@imposit.com> wrote: > Yes, youre right,.. im sorry.. i worte it bevore > i realized it later thats for virtual user setups only.. was upgrading at > 4am ... saw the addon to innodb and just put in without real notice .. > i mean i read it but ... hey i was tired :-) > > just wanted to test as much as possible in the new versions... > so we testet the documentation isnt 100% proof against me when im tired > lol > no honestly should be added to be clear.. virtual user only constrains.. i > read it wrong and overlooked the virtual user things.. > stupid i know .. should make off a day or so > > > | Big in what sense? Size or amount of rows? You will probably be > | surprised how small my data is. I mean: it is relative small compared > | to the amount of total users/domains on that system. > > Yea size (mb) of the hole db just interresting.. > Around 400MB.
> | I accomplish this by using a merged group that I constantly train. > | This helps me to keep the individual user data very, very, very small. > | > | I don't just train like normal admins do. I use my own training > | script. > > scripts are welcome you know that *ggg* > LOL. I am not going to publish that. I have made that script for me mainly because I wanted to use TONE (Train On error or Near Error) and the script is done in such a way that I can handle it. Publishing it would expose me to more (unnecessary) questions from people trying to use the script while not understanding anything about the mathematical topic the script is dealing with. This is something that I want to avoid. > > | I don't understand that. Could you rephraze that question? > > Ok,.. since you have to retrain the hole system when you switch the > tokenizer my idea was to take every full quarantine mbox on my system (ok > not all of them but some big one) > merge them together and use them for training with the new tokenizer > Aha. I see. Trust me when I tell you that SPAM is not your problem. You will find a gazillion of corpi out there with a good load of SPAM. Your problem is HAM. You need to get a lot of HAM for the training and the quarantine just holds SPAM and no HAM. Training with the quarantine only is pointless. The other issue is that SPAM in it's core is easy to find. A bunch of SPAM messages are enough for something like OSB. But HAM on the other hand can be very complex and diverse. In order to have good catching rate you should at least train the double amount of HAM than SPAM. SPAM is easy. Patterns in SPAM are +/- the same since many years. The technology has changed how to send those SPAM messages. That's all. In it's core a SPAM message is a simple construct and something like OSB or SBPH will catch up very quickly and be able to identify SPAM messages. > the idea is that the user dont see much differences when i switch it. > Then pre-train on a empty instance and then on day X just switch the data. Switching in production to OSB will result in your users seeing a difference. At least they will see FP/FN until your token database is accurate enough. -- Kind Regards from Switzerland, Stevan Bajić ------------------------------------------------------------------------------ The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm _______________________________________________ Dspam-devel mailing list Dspam-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-devel