> I would do that differently. I would query the default (uid 0)

You are right ! I've learnd that uid 0 is the default very recently
and forgot to take it into account

> According to dspam.conf:[...]

so, I've taken your updated script, and crafted it to follow as close as
possible what dspam_clean does (at least as said in the man page).

So my modifs are :
- add the same sql variables as dspam_clean manages
- load uid 0 training pref (in a single query)

I've also added a query to delete old token whose probability is between 0.35
and 0.65.

For the transaction parts, I've always though that a single SQL query is
always atomic, so no need for a transaction for just one query.
Am I wrong ?

>> -- Cleanup dictionnaries of passive users

> Such a query should run as one of the first queries. But why do you punish
> users not having reclassified anything?

Then that may be too specific to my setup ... :)


So, how about this new proposition ?

Nicolas


--
-- Set some defaults
--
SET @TrainingMode    = 'TEFT';      -- Default training mode
SET @PurgeSignatures = 14;          -- Stale signatures
SET @PurgeNeutral    = 90;          -- Tokens with neutralish probabilities
SET @PurgeUnused     = 90;          -- Unused tokens
SET @PurgeHapaxes    = 30;          -- Tokens with less than 5 hits (hapaxes)
SET @PurgeHits1S     = 15;          -- Tokens with only 1 spam hit
SET @PurgeHits1I     = 15;          -- Tokens with only 1 innocent hit
SET @today = to_days(current_date());

--
-- Delete tokens with less than 5 hits (hapaxes)
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
  FROM dspam_token_data
    WHERE from_days(@tod...@purgehapaxes) > last_hit
      AND (2*innocent_hits)+spam_hits < 5;
COMMIT;

--
-- Delete tokens with only 1 spam hit
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
  FROM dspam_token_data
    WHERE from_days(@tod...@purgehits1s) > last_hit
      AND innocent_hits = 0 AND spam_hits = 1;
COMMIT;

--
-- Delete tokens with only 1 innocent hit
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
  FROM dspam_token_data
    WHERE from_days(@tod...@purgehits1i) > last_hit
      AND innocent_hits = 1 AND spam_hits = 0;
COMMIT;

--
-- Delete tokens with neutralish probabilities
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
  FROM dspam_token_data
    WHERE from_days(@tod...@purgeneutral) > last_hit
      AND spam_hits/(innocent_hits+spam_hits) BETWEEN 0.35 AND 0.65
COMMIT;

--
-- Delete unused tokens, except for TOE, TUM and NOTRAIN modes
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
  FROM t USING dspam_token_data t
    LEFT JOIN dspam_preferences p ON p.preference = 'trainingMode' AND p.uid =
t.uid
    LEFT JOIN dspam_preferences d ON d.preference = 'trainingMode' AND d.uid = 0
  WHERE from_days(@tod...@purgeunused) > last_hit
    AND COALESCE(p.value,d.value,@TrainingMode) NOT IN ('TOE','TUM','NOTRAIN');
COMMIT;

--
-- Delete TUM tokens seen no more than 50 times
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
  FROM t USING dspam_token_data t
    LEFT JOIN dspam_preferences p ON p.preference = 'trainingMode' AND p.uid =
t.uid
    LEFT JOIN dspam_preferences d ON d.preference = 'trainingMode' AND d.uid = 0
  WHERE from_days(@tod...@purgeunused) > last_hit
    AND COALESCE(p.value,d.value,@TrainingMode) = 'TUM'
    AND innocent_hits + spam_hits < 50;
COMMIT;

--
-- Delete stale signatures
--
START TRANSACTION;
DELETE LOW_PRIORITY QUICK
  FROM dspam_signature_data
  WHERE from_days(@tod...@purgesignatures) > created_on;
COMMIT;


------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to