-------- Original-Nachricht --------
> Datum: Tue, 28 Jul 2009 06:37:18 -0500
> Von: "Todd T. Fries" <[email protected]>
> An: Steve <[email protected]>
> CC: [email protected]
> Betreff: Re: [Dspam-user] dspam_merge and mysql errors

> Steve,
> 
> Maybe dspam_merge might abstract things a bit and use the below for
> mysql?  I wonder what the postgresql equivalent would be since I
> use dspam and postgresql.
> 
AGggrrr! I could not read your mail without going to read how functions are 
done on PostgreSQL. So here is a blind try I coded by reading the PostgreSQL 
documentation. I don't know if it works or not? Maybe someone with DSPAM on 
PostgreSQL could check it out and tell here if it works or not? I only did the 
token merging part. If that would work then doing the statistics merging should 
be easy as 1-2-3.

------
CREATE FUNCTION dspam_merge_tokens(uid INTEGER, uids INTEGER[]) RETURNS void 
LANGUAGE plpgsql VOLATILE AS
$$
DECLARE
  v_rec RECORD;
BEGIN
  -- get all tokens from the source uid(s)
  FOR v_rec IN SELECT * FROM dspam_token_data
    WHERE dspam_token_data.uid IN (SELECT dspam_merge_tokens.uids[i] FROM 
GENERATE_SERIES(ARRAY_LOWER(dspam_merge_tokens.uids,1),ARRAY_UPPER(dspam_merge_tokens.uids,1))
 s(i))
  LOOP
    -- first try to update the token for target uid
    EXECUTE 'UPDATE dspam_token_data SET spam_hits=spam_hits+' || 
v_rec.spam_hits || ',innocent_hits=innocent_hits+' || v_rec.innocent_hits || 
',last_hit=GREATEST(last_hit,DATE\'' || v_rec.last_hit || '\') WHERE uid=' || 
dspam_merge_tokens.uid || ' AND token=' || v_rec.token;
    IF NOT found THEN
      -- If updating token failed, try to insert the token
      -- if someone else inserts the same token concurrently,
      -- we could get a unique-key failure
      BEGIN
        EXECUTE 'INSERT INTO 
dspam_token_data(uid,token,spam_hits,innocent_hits,last_hit) VALUES (' || 
dspam_merge_tokens.uid || ', ' || v_rec.token || ', ' || v_rec.spam_hits || ', 
' || v_rec.innocent_hits || ', CURRENT_DATE)';
      EXCEPTION WHEN unique_violation THEN
        -- do nothing, and loop to try the UPDATE again
      END;
    END IF;
  END LOOP;
  RETURN;
END;
$$;
------

I think I could avoid the EXECUTE statement and directly execute the SQL clause 
in the function. But I don't know how well that would work? And I think I need 
to specify the function to be VOLATILE because I do an update? The two 
functions created with stock DSPAM are set to STABLE but I think that is just 
because they don't change anything. But again... I don't know.

Can you try that? Following my MySQL statement (the one with merging uid 1 and 
2 and 3 into uid 2000) the call to the function should be:
----
SELECT dspam_merge_tokens(2000,ARRAY[1,2,3]);
----


PLEASE do the above on a copy of your original data or create a table with test 
data. I don't know if the function works. I just coded it by reading the 
PostgreSQL documentation. I could be totally wrong and mess up all your data. 
However... let me know if the function worked.


> Thanks,
> 
// Steve


> Penned by Steve on 20090705  4:33.57, we have:
> | Hallo Mark,
> | 
> | here the SQL clause doing what dspam_merge does. Assuming you would like
> to merge the tokens from USER1, USER2 and USER3 into the data of USER2000.
> | 
> | Assuming:
> |  - USER2000 has uid 2000
> |  - USER1 has uid 1
> |  - USER2 has uid 2
> |  - USER3 has uid 3
> | 
> | Then the SQL clause for merging the tokens would be:
> | INSERT INTO dspam_token_data(uid,token,spam_hits,innocent_hits,last_hit)
> SELECT 2000,token,SUM(spam_hits),SUM(innocent_hits),MAX(last_hit) FROM
> dspam_token_data WHERE uid IN(1,2,3) GROUP BY token ON DUPLICATE KEY UPDATE
> spam_hits=VALUES(spam_hits)+spam_hits,innocent_hits=VALUES(innocent_hits)+innocent_hits,last_hit=IF(VALUES(last_hit)>last_hit,VALUES(last_hit),last_hit);
> | 
> | 
> | And the SQL clause for cumulating the statistics would be:
> | INSERT INTO
> dspam_stats(uid,spam_learned,innocent_learned,spam_misclassified,innocent_misclassified,spam_corpusfed,innocent_corpusfed,spam_classified,innocent_classified)
> SELECT
> 2000,SUM(spam_learned),SUM(innocent_learned),SUM(spam_misclassified),SUM(innocent_misclassified),SUM(spam_corpusfed),SUM(innocent_corpusfed),SUM(spam_classified),SUM(innocent_classified)
> FROM dspam_stats WHERE uid IN(1,2,3) GROUP BY uid ON DUPLICATE KEY UPDATE
> spam_learned=VALUES(spam_learned)+spam_learned,innocent_learned=VALUES(innocent_learned)+innocent_learned,spam_misclassified=VALUES(spam_misclassified)+spam_misclassified,innocent_misclassified=VALUES(innocent_misclassified)+innocent_misclassified,spam_corpusfed=VALUES(spam_corpusfed)+spam_corpusfed,innocent_corpusfed=VALUES(innocent_corpusfed)+innocent_corpusfed,spam_classified=VALUES(spam_classified)+spam_classified,innocent_classified=VALUES(innocent_classified)+innocent_classified;
> | 
> | 
> | I know, I know. It's a crazy long SQL clause. Anyway... it should be
> faster then doing it with dspam_merge.
> | 
> | Let me know how it went for you.
> | 
> | 
> | cheers,
> | 
> | Steve
> | 
> | 
> | -------- Original-Nachricht --------
> | > Datum: Fri, 03 Jul 2009 18:27:25 +0200
> | > Von: "Steve" <[email protected]>
> | > An: 
> | > CC: [email protected]
> | > Betreff: Re: [Dspam-user] dspam_merge and mysql errors
> | 
> | > Hallo Mark,
> | > 
> | > 
> | > > It does seem to be a somewhat unhappy tool :)  If you ask it for
> help,
> | > > it barfs in a nasty way.  I can send that output along too if you'd
> like
> | > > that as well.
> | > > 
> | > I am going to fix that. I must!
> | > 
> | > 
> | > > Absolutely acceptable.
> | > > 
> | > I am a bit in a rush right now so I am going to send you a SQL clause
> that
> | > does something like dspam_merge but not exactly what dspam_merge does.
> | > dspam_merge is able to merge the tokens from USER1 into the tokens of
> USER2.
> | > The SQL clause from here will merge tokens from USER1, USER2, USERx
> into a
> | > new user USERy.
> | > 
> | > Assuming:
> | >  - USERy has uid 2000
> | >  - USER1 has uid 1
> | >  - USER2 has uid 2
> | >  - USERx has uid 3
> | > 
> | > Then you should first drop all the tokens for USERy:
> | > DELETE FROM dspam_token_data WHERE uid=2000;
> | > 
> | > And then you can use one SQL clause to merge the tokens from USER1,
> USER2
> | > and USERx into USERy:
> | > 
> | > INSERT INTO dspam_token_data
> (uid,token,spam_hits,innocent_hits,last_hit)
> | > SELECT 2000,token,SUM(spam_hits),SUM(innocent_hits),MAX(last_hit) FROM
> | > dspam_token_data WHERE uid IN (1,2,3) GROUP BY token;
> | > 
> | > To be 100% like dspam_merge I would need to do "ON DUPLICATE" but
> right
> | > now I can't test that since I am on the run and I don't want to send
> you
> | > something non working. But I will quickly do the stuff over the
> weekend and
> | > send it here. I have to look if I can glue everything into SQL or if I
> need to
> | > make a quick script? Anyway... I will send the script here to the
> list.
> | > Just give me the weekend to look at it.
> | > 
> | > 
> | > > pace
> | > >
> | > Steve
> | > -- 
> | > GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
> | > Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
> | > 
> | >
> ------------------------------------------------------------------------------
> | > _______________________________________________
> | > Dspam-user mailing list
> | > [email protected]
> | > https://lists.sourceforge.net/lists/listinfo/dspam-user
> | 
> | -- 
> | GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT!
> | Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
> | 
> |
> ------------------------------------------------------------------------------
> | _______________________________________________
> | Dspam-user mailing list
> | [email protected]
> | https://lists.sourceforge.net/lists/listinfo/dspam-user
> 
> -- 
> Todd Fries .. [email protected]
> 
>  _____________________________________________
> |                                             \  1.636.410.0632 (voice)
> | Free Daemon Consulting, LLC                 \  1.405.227.9094 (voice)
> | http://FreeDaemonConsulting.com             \  1.866.792.3418 (FAX)
> | "..in support of free software solutions."  \  sip:[email protected]
> |                                             \  sip:[email protected]
>  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>                                                  
>               37E7 D3EB 74D0 8D66 A68D  B866 0326 204E 3F42 004A
>                         http://todd.fries.net/pgp.txt

-- 
Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to