http://bugzilla.spamassassin.org/show_bug.cgi?id=3771
------- Additional Comments From [EMAIL PROTECTED] 2004-11-18 23:42 ------- On 11/18/2004 10:48 PM, [EMAIL PROTECTED] wrote: >>Leave it as bytea... > > Interesting, I think my main concern was the fact that BYTEA was the > only way to make sure you got any trailing whitespace (which we do > get) so it had to be used. Like I said, I'm far from the postgresql > expert so I'm gladly proven wrong. Given that we can't guarantee db encoding (Someone mentioned that RH fedora core ships with encoding enabled) we're best off using bytea. Ignore that I brought this up. :) >>1) Why not a unique index that mimics the primary key (though do it in >>token,id order not id,token)? Won't matter in my case (since I run as >>one user) and probably doen't matter at all unless running with lots 'n >>lots of users... > > > Didn't realize it was necessary. On second pass, it isn't. I just starting perusing the statics tables in my system and found that there were two sets of indexes. The ones for the forien key and the ones I created manually. The system created PK index is hidden (at least n pgAdmin) -- my mistake. In any case, the system index is built on the order of the keys -- best to swap the keys (token,id) and (seen,id). Given we have a unique index on these fields and in the right order we should be ok asis. >>2) bayes_seen.msgid should be type 'text' -- sa-learn (and others) don't >>truncate to 200. > > > We should just truncate in the code, maybe it needs to be a little > bigger but add a hard substr to the code anyway. For fields under 255 chars there is no penalty (or storage weirness) using text vs varchar(200). Postgres stores it as a 1byte length and then data and the field is no longer than that. If it goes over then I believe it moves the data to the toast table -- so a slight penalty there. I think I saw 5 greater than 200chars out of 202863. dbm obviously stores the full length. It is mysql that silently ignores (or so I'm told, I can't verify). >>3) I also get differences in the backup file. [snip] > This is a problem, see the bug for a short discussion. There is for > sure some differences in output that should not be there. i did another run with debugging on and noticed that some of the seen lines got disgarded. That might account for the difference when stricly looking at file sizes. > I started running the auto analyzer deal to keep the statistics > up-to-date, this helps keep from trailing off later in the run. Ah, I'll play on the next import (one index, just the PK one). -- -Rupa ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.