[Bug 3771] PostgreSQL Specific Bayes Storage Module

bugzilla-daemon 19 Nov 2004 07:42:15 -0000

http://bugzilla.spamassassin.org/show_bug.cgi?id=3771

------- Additional Comments From [EMAIL PROTECTED]  2004-11-18 23:42 -------
On 11/18/2004 10:48 PM, [EMAIL PROTECTED] wrote:

>>Leave it as bytea...
> 
> Interesting, I think my main concern was the fact that BYTEA was the
> only way to make sure you got any trailing whitespace (which we do
> get) so it had to be used.  Like I said, I'm far from the postgresql
> expert so I'm gladly proven wrong.

Given that we can't guarantee db encoding (Someone mentioned that RH fedora core
ships with encoding enabled) we're best off using bytea.  Ignore that I brought
this up. :) 

>>1) Why not a unique index that mimics the primary key (though do it in
>>token,id order not id,token)?  Won't matter in my case (since I run as
>>one user) and probably doen't matter at all unless running with lots 'n
>>lots of users...
> 
> 
> Didn't realize it was necessary.

On second pass, it isn't.  I just starting perusing the statics tables in my
system and found that there were two sets of indexes.  The ones for the forien
key and the ones I created manually.  The system created PK index is hidden (at
least n pgAdmin) -- my mistake.

In any case, the system index is built on the order of the keys -- best to swap
the keys (token,id) and (seen,id).

Given we have a unique index on these fields and in the right order we should be
ok asis.

>>2) bayes_seen.msgid should be type 'text' -- sa-learn (and others) don't
>>truncate to 200.
> 
> 
> We should just truncate in the code, maybe it needs to be a little
> bigger but add a hard substr to the code anyway.

For fields under 255 chars there is no penalty (or storage weirness) using text
vs varchar(200).  Postgres stores it as a 1byte length and then data and the
field is no longer than that.  If it goes over then I believe it moves the data
to the toast table -- so a slight penalty there. I think I saw 5 greater than
200chars out of 202863.  dbm obviously stores the full length.  It is mysql that
silently ignores (or  so I'm told, I can't verify).

>>3) I also get differences in the backup file.
[snip]

> This is a problem, see the bug for a short discussion.  There is for
> sure some differences in output that should not be there.

i did another run with debugging on and noticed that some of the seen lines got
disgarded.  That might account for the difference when stricly looking at file
sizes.

> I started running the auto analyzer deal to keep the statistics
> up-to-date, this helps keep from trailing off later in the run.

Ah, I'll play on the next import (one index, just the PK one).

-- 
 -Rupa

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 3771] PostgreSQL Specific Bayes Storage Module

Reply via email to