Mick,

I was reading my mail from most recent to less recent. If you
are the one who should receive patches, I have attached my
PostgreSQL 8+ performance patch for group performance. It is
based on the DSPAM 3.6.2 update to the driver. Please consider
applying it for the 3.8.1 release. I would be happy to answer
any questions and there is a longish thread in the mailing
list archives from when I submitted it originally. Thank you
for your consideration.

Cheers,
Ken

On Fri, Feb 01, 2008 at 01:47:25PM -0800, Mick Johnson wrote:
> All
> 
> A few pending updates have been pushed up to CVS :
> 
> * Allow users to select multiple rows by clicking
> on the initial row, holding shift, and clicking on the final row.
> * Adds a "select 200" button to the quarantine page. 
> * Removed some junk from a previous merge.
> 
> The Feature Request page has also been updated - the donations button and
> dollar value components have been removed as we're no longer accepting
> donations for this project. 
> 
> A few of the patches came in via this interface, this actually makes it
> harder to patch as a) I don't know who submitted them and can't bring up any
> suggestions or corrections directly, and b) they have less visibility on
> this list. In the future, if you wish to submit patches (and I'm always
> happy when people do) please do so here to ensure the community can review.
> 
> The latest CVS version seems to have been running stably for some time now
> and I'm looking to push this out as a stable 3.8.1 this month unless I hear
> otherwise. 
> 
> Finally, looking forward to a great 2008!
> 
> Cheers
> Mick Johnson
> Sensory Networks
> 
> 
-----------previous patch posting to dspam-users/dspam-dev--------
Dear dspam-users:

I sent this patch update to the dspam-dev list, but I never
saw it. It should be useful to others in the DSPAM community
that are or would like to use PostgreSQL as the DB backend.
The patch is against version 3.6.2 but should apply fairly
easily to any of the 3.6.x series. 

Ken Marshall

----- Forwarded message from Kenneth Marshall <[EMAIL PROTECTED]> -----

Date: Sun, 26 Feb 2006 16:49:16 -0600
From: Kenneth Marshall <[EMAIL PROTECTED]>
To: Rustam Aliyev <[EMAIL PROTECTED]>
Cc: Jonathan Zdziarski <[EMAIL PROTECTED]>,
        dspam-dev@lists.nuclearelephant.com,
Subject: Re: [dspam-dev] PostgreSQL Performance Patch

Jonathan and Rustam,

We are still having problems with the delicateness of MySQL
as a backend DB. The problem is not with DSPAM at all but that
we are having problems scaling MySQL to the number of users and
still be able to provide a backup DB server, in case the primary
server has an outage. Also, the database cleaning and optimization
shutsdown the processing while it is running when using a MYISAM
DB. With the updated PostgreSQL driver in DSPAM and the release
of version 8.1, we are again testing the PostgreSQL backend.

I had high expectations for the combo of DSPAM 3.6 and PostgreSQL
8.1, but while it easily handled much more than 3.2/3.4 and 8.0
it still took too long to process messages. The basic token select
took seconds to perform. I finally tracked the problem down to the
fact that the updated 8.x query is only used if you are not using
the merged group. Without the merged/global group the selects took
approximately 20ms with it they took 3000ms.

I generated the attached patch which uses the same technique to
speed the selects for both the uid and gid. With this patch, my
test queries returned identical results, but in 22ms versus almost
5900ms for the unpatched code. Could you please take a look at it
and consider it for inclusion in the next point release.

Yours truly,
Ken Marshall

On Wed, Aug 03, 2005 at 10:42:57AM +0500, Rustam Aliyev wrote:
> Here's the patch. Should work, tested.
> 
> There's also alternative way (just for discussion): maybe it would be 
> better to create two different "lookup_tokens()" functions for 7.x and 
> 8.x versions?
> 
> 
> Jonathan Zdziarski wrote:
> 
> >so what do you think rustam about checking the version and adjusting  
> >the query accordingly?
> >
> >Jonathan
> 
> 

> --- pgsql_drv.c.jz    Wed Aug  3 10:09:18 2005
> +++ pgsql_drv.c       Wed Aug  3 10:16:10 2005
> @@ -498,8 +498,15 @@
>             "FROM dspam_token_data WHERE uid IN ('%d','%d') AND token IN (",
>             uid, gid);
>    } else {
> -    snprintf (scratch, sizeof (scratch),
> -              "SELECT * FROM lookup_tokens(%d, '{", uid);
> +    if (PQserverVersion(s->dbh) > 80000) {
> +      snprintf (scratch, sizeof (scratch),
> +                "SELECT * FROM lookup_tokens(%d, '{", uid);
> +    } else {
> +      snprintf (scratch, sizeof (scratch),
> +                "SELECT uid, token, spam_hits, innocent_hits "
> +                "FROM dspam_token_data WHERE uid = '%d' AND token IN (",
> +                uid);
> +    }
>    }
>  
>    buffer_cat (query, scratch);
> @@ -520,10 +527,14 @@
>    }
>    ds_diction_close(ds_c);
>  
> -  if (gid != uid) 
> +  if (PQserverVersion(s->dbh) > 80000) {
> +    if (gid != uid) 
> +      buffer_cat (query, ")");
> +    else
> +      buffer_cat(query, "}')");
> +  } else {
>      buffer_cat (query, ")");
> -  else
> -    buffer_cat(query, "}')");
> +  }
>  
>  #ifdef VERBOSE
>    LOGDEBUG ("pgsql query length: %ld\n", query->used);


--- pgsql_drv.c_362     2006-02-26 16:25:29.266084245 -0600
+++ pgsql_drv.c 2006-02-24 14:17:24.717817917 -0600
@@ -490,10 +490,15 @@
   }
 
   if (gid != uid) {
-    snprintf (scratch, sizeof (scratch),
-              "SELECT uid, token, spam_hits, innocent_hits "
-              "FROM dspam_token_data WHERE uid IN ('%d','%d') AND token IN (",
-              uid, gid);
+    if (s->pg_major_ver >= 8) {
+      snprintf (scratch, sizeof (scratch),
+                "SELECT * FROM lookup_tokens(%d, %d, '{", uid, gid);
+    } else {
+      snprintf (scratch, sizeof (scratch),
+                "SELECT uid, token, spam_hits, innocent_hits "
+                "FROM dspam_token_data WHERE uid IN ('%d','%d') AND token IN 
(",
+                uid, gid);
+    }
   } else {
     if (s->pg_major_ver >= 8) {
       snprintf (scratch, sizeof (scratch),
@@ -525,10 +530,7 @@
   ds_diction_close(ds_c);
 
   if (s->pg_major_ver >= 8) {
-    if (gid != uid) 
-      buffer_cat (query, ")");
-    else
-      buffer_cat(query, "}')");
+    buffer_cat(query, "}')");
   } else {
     buffer_cat (query, ")");
   }

--- pgsql_objects.sql_362       2006-02-26 16:25:52.636539923 -0600
+++ pgsql_objects.sql   2006-02-24 12:56:51.523174334 -0600
@@ -74,3 +74,29 @@
   return;
 end;';
 
+create function lookup_tokens(integer,integer,bigint[])
+  returns setof dspam_token_data
+  language plpgsql stable
+  as '
+declare
+  v_rec record;
+begin
+  for v_rec in select * from dspam_token_data
+                where uid=$1
+                  and token in (select $3[i]
+                                  from generate_series(array_lower($3,1),
+                                                       array_upper($3,1)) s(i))
+  loop
+    return next v_rec;
+  end loop;
+  for v_rec in select * from dspam_token_data
+                where uid=$2
+                  and token in (select $3[i]
+                                  from generate_series(array_lower($3,1),
+                                                       array_upper($3,1)) s(i))
+  loop
+    return next v_rec;
+  end loop;
+  return;
+end;';
+


----- End forwarded message -----

Reply via email to