On Sat, Dec 19, 2009 at 03:28:01PM +0100, Stevan Baji?? wrote:
> On Sat, 19 Dec 2009 14:30:28 +0100 (CET)
> "Nicolas Grekas" <nicolas.gre...@espci.org> wrote:
> 
> > > I would do that differently. I would query the default (uid 0)
> > 
> > You are right ! I've learnd that uid 0 is the default very recently
> > and forgot to take it into account
> > 
> :)
> 
> 
> > > According to dspam.conf:[...]
> > 
> > so, I've taken your updated script, and crafted it to follow as close as
> > possible what dspam_clean does (at least as said in the man page).
> > 
> > So my modifs are :
> > - add the same sql variables as dspam_clean manages
> >
> Great.
> 
> > - load uid 0 training pref (in a single query)
> > 
> Great.
> 
> 
> > I've also added a query to delete old token whose probability is between 
> > 0.35
> > and 0.65.
> > 
> Here I have to intercept. That 0.35 to 0.65 is not that easy to compute as 
> you have done in the SQL clause. The problem is that to compute the 
> probability you would need to query the stats table and use the totals from 
> there and then you would need to read PValue and based on what is there do 
> the computation. And this is just the basic stuff. You would still need to 
> read the group file from DSPAM and look if the user is belonging to a 
> shared/merged group and load the totals from there as well and and and... to 
> make it short: I would not try to purge neutral tokens from within the SQL 
> purge script. It will get to complicated.
> 
> 
> > For the transaction parts, I've always though that a single SQL query is
> > always atomic, so no need for a transaction for just one query.
> > Am I wrong ?
> > 
> No. You are right. Single queries are atomic. It's just convinient to add the 
> transaction parts into the script in case that one is going to extend that 
> block and add other stuff there. Then the person modifying the block does not 
> need to care about transactions and we could later even implement a roll-back 
> if needed.
> 
> 
> > >> -- Cleanup dictionnaries of passive users
> > 
> > > Such a query should run as one of the first queries. But why do you punish
> > > users not having reclassified anything?
> > 
> > Then that may be too specific to my setup ... :)
> > 
> No, no. My error. I later realized that it is the signature table and not the 
> token table. So it's not at all important when you purge them.
> 
> 
> > So, how about this new proposition ?
> > 
> Looking good IMHO. Need to quickly test it and then push it to GIT :)
> Thanks for the time to craft those SQL clauses. Now you should be nice and go 
> on and install PostgreSQL and do the same for PostgreSQL :) :) :) ;)
> 
> btw: I have done some quick tests with MySQL 5.1.41 and the additional 
> indices. On a InnoDB table adding those indices do not speed up the purging. 
> They do speed up but it's so unsignificant that I ask my self if it is really 
> that beneficial to index all fields from the dspam_token_data table (and 
> double the size of the table)? Had you a big performance impact when enabling 
> the index of all 3 additional fields? What engine are you using?

Stevan,

I am not a MySQL expert, but no index updates are free so less is more, 
especially
when there is not a significat performance advantage. I would think that a full
table scan would be the cheapest way to purge tokens in a DB, unless you are 
running
a very small DSPAM instance and all of your token data is in memory.

Regards,
Ken

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to