-------- Original-Nachricht --------
> Datum: Tue, 28 Jul 2009 06:37:18 -0500
> Von: "Todd T. Fries" <[email protected]>
> An: Steve <[email protected]>
> CC: [email protected]
> Betreff: Re: [Dspam-user] dspam_merge and mysql errors
> Steve,
>
> Maybe dspam_merge might abstract things a bit and use the below for
> mysql? I wonder what the postgresql equivalent would be since I
> use dspam and postgresql.
>
AGggrrr! I could not read your mail without going to read how functions are
done on PostgreSQL. So here is a blind try I coded by reading the PostgreSQL
documentation. I don't know if it works or not? Maybe someone with DSPAM on
PostgreSQL could check it out and tell here if it works or not? I only did the
token merging part. If that would work then doing the statistics merging should
be easy as 1-2-3.
------
CREATE FUNCTION dspam_merge_tokens(uid INTEGER, uids INTEGER[]) RETURNS void
LANGUAGE plpgsql VOLATILE AS
$$
DECLARE
v_rec RECORD;
BEGIN
-- get all tokens from the source uid(s)
FOR v_rec IN SELECT * FROM dspam_token_data
WHERE dspam_token_data.uid IN (SELECT dspam_merge_tokens.uids[i] FROM
GENERATE_SERIES(ARRAY_LOWER(dspam_merge_tokens.uids,1),ARRAY_UPPER(dspam_merge_tokens.uids,1))
s(i))
LOOP
-- first try to update the token for target uid
EXECUTE 'UPDATE dspam_token_data SET spam_hits=spam_hits+' ||
v_rec.spam_hits || ',innocent_hits=innocent_hits+' || v_rec.innocent_hits ||
',last_hit=GREATEST(last_hit,DATE\'' || v_rec.last_hit || '\') WHERE uid=' ||
dspam_merge_tokens.uid || ' AND token=' || v_rec.token;
IF NOT found THEN
-- If updating token failed, try to insert the token
-- if someone else inserts the same token concurrently,
-- we could get a unique-key failure
BEGIN
EXECUTE 'INSERT INTO
dspam_token_data(uid,token,spam_hits,innocent_hits,last_hit) VALUES (' ||
dspam_merge_tokens.uid || ', ' || v_rec.token || ', ' || v_rec.spam_hits || ',
' || v_rec.innocent_hits || ', CURRENT_DATE)';
EXCEPTION WHEN unique_violation THEN
-- do nothing, and loop to try the UPDATE again
END;
END IF;
END LOOP;
RETURN;
END;
$$;
------
I think I could avoid the EXECUTE statement and directly execute the SQL clause
in the function. But I don't know how well that would work? And I think I need
to specify the function to be VOLATILE because I do an update? The two
functions created with stock DSPAM are set to STABLE but I think that is just
because they don't change anything. But again... I don't know.
Can you try that? Following my MySQL statement (the one with merging uid 1 and
2 and 3 into uid 2000) the call to the function should be:
----
SELECT dspam_merge_tokens(2000,ARRAY[1,2,3]);
----
PLEASE do the above on a copy of your original data or create a table with test
data. I don't know if the function works. I just coded it by reading the
PostgreSQL documentation. I could be totally wrong and mess up all your data.
However... let me know if the function worked.
> Thanks,
>
// Steve
> Penned by Steve on 20090705 4:33.57, we have:
> | Hallo Mark,
> |
> | here the SQL clause doing what dspam_merge does. Assuming you would like
> to merge the tokens from USER1, USER2 and USER3 into the data of USER2000.
> |
> | Assuming:
> | - USER2000 has uid 2000
> | - USER1 has uid 1
> | - USER2 has uid 2
> | - USER3 has uid 3
> |
> | Then the SQL clause for merging the tokens would be:
> | INSERT INTO dspam_token_data(uid,token,spam_hits,innocent_hits,last_hit)
> SELECT 2000,token,SUM(spam_hits),SUM(innocent_hits),MAX(last_hit) FROM
> dspam_token_data WHERE uid IN(1,2,3) GROUP BY token ON DUPLICATE KEY UPDATE
> spam_hits=VALUES(spam_hits)+spam_hits,innocent_hits=VALUES(innocent_hits)+innocent_hits,last_hit=IF(VALUES(last_hit)>last_hit,VALUES(last_hit),last_hit);
> |
> |
> | And the SQL clause for cumulating the statistics would be:
> | INSERT INTO
> dspam_stats(uid,spam_learned,innocent_learned,spam_misclassified,innocent_misclassified,spam_corpusfed,innocent_corpusfed,spam_classified,innocent_classified)
> SELECT
> 2000,SUM(spam_learned),SUM(innocent_learned),SUM(spam_misclassified),SUM(innocent_misclassified),SUM(spam_corpusfed),SUM(innocent_corpusfed),SUM(spam_classified),SUM(innocent_classified)
> FROM dspam_stats WHERE uid IN(1,2,3) GROUP BY uid ON DUPLICATE KEY UPDATE
> spam_learned=VALUES(spam_learned)+spam_learned,innocent_learned=VALUES(innocent_learned)+innocent_learned,spam_misclassified=VALUES(spam_misclassified)+spam_misclassified,innocent_misclassified=VALUES(innocent_misclassified)+innocent_misclassified,spam_corpusfed=VALUES(spam_corpusfed)+spam_corpusfed,innocent_corpusfed=VALUES(innocent_corpusfed)+innocent_corpusfed,spam_classified=VALUES(spam_classified)+spam_classified,innocent_classified=VALUES(innocent_classified)+innocent_classified;
> |
> |
> | I know, I know. It's a crazy long SQL clause. Anyway... it should be
> faster then doing it with dspam_merge.
> |
> | Let me know how it went for you.
> |
> |
> | cheers,
> |
> | Steve
> |
> |
> | -------- Original-Nachricht --------
> | > Datum: Fri, 03 Jul 2009 18:27:25 +0200
> | > Von: "Steve" <[email protected]>
> | > An:
> | > CC: [email protected]
> | > Betreff: Re: [Dspam-user] dspam_merge and mysql errors
> |
> | > Hallo Mark,
> | >
> | >
> | > > It does seem to be a somewhat unhappy tool :) If you ask it for
> help,
> | > > it barfs in a nasty way. I can send that output along too if you'd
> like
> | > > that as well.
> | > >
> | > I am going to fix that. I must!
> | >
> | >
> | > > Absolutely acceptable.
> | > >
> | > I am a bit in a rush right now so I am going to send you a SQL clause
> that
> | > does something like dspam_merge but not exactly what dspam_merge does.
> | > dspam_merge is able to merge the tokens from USER1 into the tokens of
> USER2.
> | > The SQL clause from here will merge tokens from USER1, USER2, USERx
> into a
> | > new user USERy.
> | >
> | > Assuming:
> | > - USERy has uid 2000
> | > - USER1 has uid 1
> | > - USER2 has uid 2
> | > - USERx has uid 3
> | >
> | > Then you should first drop all the tokens for USERy:
> | > DELETE FROM dspam_token_data WHERE uid=2000;
> | >
> | > And then you can use one SQL clause to merge the tokens from USER1,
> USER2
> | > and USERx into USERy:
> | >
> | > INSERT INTO dspam_token_data
> (uid,token,spam_hits,innocent_hits,last_hit)
> | > SELECT 2000,token,SUM(spam_hits),SUM(innocent_hits),MAX(last_hit) FROM
> | > dspam_token_data WHERE uid IN (1,2,3) GROUP BY token;
> | >
> | > To be 100% like dspam_merge I would need to do "ON DUPLICATE" but
> right
> | > now I can't test that since I am on the run and I don't want to send
> you
> | > something non working. But I will quickly do the stuff over the
> weekend and
> | > send it here. I have to look if I can glue everything into SQL or if I
> need to
> | > make a quick script? Anyway... I will send the script here to the
> list.
> | > Just give me the weekend to look at it.
> | >
> | >
> | > > pace
> | > >
> | > Steve
> | > --
> | > GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
> | > Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
> | >
> | >
> ------------------------------------------------------------------------------
> | > _______________________________________________
> | > Dspam-user mailing list
> | > [email protected]
> | > https://lists.sourceforge.net/lists/listinfo/dspam-user
> |
> | --
> | GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
> | Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
> |
> |
> ------------------------------------------------------------------------------
> | _______________________________________________
> | Dspam-user mailing list
> | [email protected]
> | https://lists.sourceforge.net/lists/listinfo/dspam-user
>
> --
> Todd Fries .. [email protected]
>
> _____________________________________________
> | \ 1.636.410.0632 (voice)
> | Free Daemon Consulting, LLC \ 1.405.227.9094 (voice)
> | http://FreeDaemonConsulting.com \ 1.866.792.3418 (FAX)
> | "..in support of free software solutions." \ sip:[email protected]
> | \ sip:[email protected]
> \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>
> 37E7 D3EB 74D0 8D66 A68D B866 0326 204E 3F42 004A
> http://todd.fries.net/pgp.txt
--
Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user