On Sat, Dec 19, 2009 at 03:28:01PM +0100, Stevan Baji?? wrote: > On Sat, 19 Dec 2009 14:30:28 +0100 (CET) > "Nicolas Grekas" <nicolas.gre...@espci.org> wrote: > > > > I would do that differently. I would query the default (uid 0) > > > > You are right ! I've learnd that uid 0 is the default very recently > > and forgot to take it into account > > > :) > > > > > According to dspam.conf:[...] > > > > so, I've taken your updated script, and crafted it to follow as close as > > possible what dspam_clean does (at least as said in the man page). > > > > So my modifs are : > > - add the same sql variables as dspam_clean manages > > > Great. > > > - load uid 0 training pref (in a single query) > > > Great. > > > > I've also added a query to delete old token whose probability is between > > 0.35 > > and 0.65. > > > Here I have to intercept. That 0.35 to 0.65 is not that easy to compute as > you have done in the SQL clause. The problem is that to compute the > probability you would need to query the stats table and use the totals from > there and then you would need to read PValue and based on what is there do > the computation. And this is just the basic stuff. You would still need to > read the group file from DSPAM and look if the user is belonging to a > shared/merged group and load the totals from there as well and and and... to > make it short: I would not try to purge neutral tokens from within the SQL > purge script. It will get to complicated. > > > > For the transaction parts, I've always though that a single SQL query is > > always atomic, so no need for a transaction for just one query. > > Am I wrong ? > > > No. You are right. Single queries are atomic. It's just convinient to add the > transaction parts into the script in case that one is going to extend that > block and add other stuff there. Then the person modifying the block does not > need to care about transactions and we could later even implement a roll-back > if needed. > > > > >> -- Cleanup dictionnaries of passive users > > > > > Such a query should run as one of the first queries. But why do you punish > > > users not having reclassified anything? > > > > Then that may be too specific to my setup ... :) > > > No, no. My error. I later realized that it is the signature table and not the > token table. So it's not at all important when you purge them. > > > > So, how about this new proposition ? > > > Looking good IMHO. Need to quickly test it and then push it to GIT :) > Thanks for the time to craft those SQL clauses. Now you should be nice and go > on and install PostgreSQL and do the same for PostgreSQL :) :) :) ;) > > btw: I have done some quick tests with MySQL 5.1.41 and the additional > indices. On a InnoDB table adding those indices do not speed up the purging. > They do speed up but it's so unsignificant that I ask my self if it is really > that beneficial to index all fields from the dspam_token_data table (and > double the size of the table)? Had you a big performance impact when enabling > the index of all 3 additional fields? What engine are you using?
Stevan, I am not a MySQL expert, but no index updates are free so less is more, especially when there is not a significat performance advantage. I would think that a full table scan would be the cheapest way to purge tokens in a DB, unless you are running a very small DSPAM instance and all of your token data is in memory. Regards, Ken ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Dspam-devel mailing list Dspam-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-devel