Hello,

first of all, thank you for having permitted dspam to survive after some month
of inactivity !

I've just checked the latest source for purge-4.1.sql script,
and I think I have a better one to contribute, see below.
It does exactly the same thing (well, 2 or 3 differences), but takes into
account the default trainingMode - something which is missing in the actual
script !

I've also added an index on 3 columns (last_hit, spam_hits, innocent_hits in
this order) on dspam_token_data ; rather than 3 indexes on 1 columns (the
same) as you recommand in the documentation.

I also think that the last line in latest purge-4.1.sql script should not be
here (optimize table ...) :
optimizing can be very time consumming for large data sets !
personnaly, I run my purge.sql every day, but optimize only once a month.

hope it helps

Regards
Nicolas



---- my purge.sql script ----

SET @d='TEFT'; -- Default mode as defined in dspam.conf
SET @today=to_days(current_date());
SET @stale=30;

-- Delete fairly neutral tokens older than 2...@stale days
DELETE LOW_PRIORITY QUICK
FROM dspam_token_data
WHERE 2*innocent_hits + spam_hits < 5
  AND from_days(@today...@stale) > last_hit;

-- Delete tokens seen only one time in the past @stale days
DELETE LOW_PRIORITY QUICK
FROM dspam_token_data
WHERE innocent_hits + spam_hits = 1
  AND innocent_hits BETWEEN 0 AND 1
  AND spam_hits BETWEEN 0 AND 1
  AND from_days(@tod...@stale) > last_hit;

-- Delete all tokens never seen in the last 3...@stale days, except for TOE, TUM
and NOTRAIN modes
DELETE LOW_PRIORITY QUICK
FROM t USING dspam_token_data t
  LEFT JOIN dspam_preferences p
    ON t.uid = p.uid AND p.preference = 'trainingMode'
WHERE from_days(@today...@stale) > last_hit
  AND COALESCE(p.value,@d) NOT IN ('TOE','TUM','NOTRAIN');

-- Delete TUM tokens seen no more than 50 times in the last 3...@stale days
DELETE LOW_PRIORITY QUICK
FROM t USING dspam_token_data t
  LEFT JOIN dspam_preferences p
    ON t.uid = p.uid AND p.preference = 'trainingMode'
WHERE from_days(@today...@stale) > last_hit
  AND innocent_hits + spam_hits < 50
  AND COALESCE(p.value,@d) = 'TUM';

-- Delete signatures older than @stale days
DELETE LOW_PRIORITY QUICK
FROM dspam_signature_data
WHERE from_days(@tod...@stale) > created_on;

-- Cleanup dictionnaries of passive users
DELETE LOW_PRIORITY QUICK
FROM t USING dspam_token_data t
  JOIN dspam_stats s
    ON t.uid=s.uid
WHERE from_days(@tod...@stale) > last_hit
  AND s.spam_misclassified=0
  AND s.innocent_misclassified=0;




------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to