From: Sidney Markowitz [mailto:[EMAIL PROTECTED]
> 
> Giampaolo Tomassoni wrote, On 14/12/06 12:35 AM:
> > Also, I have a question which is loosely related to this. Why tokens
> > get hashed before storing to/retrieving by the db?
> 
> I'm not qualified to answer your first questions, but I can deal with
> this one.
> 
> When tokens were stored as plain text and we made the decision to change
> it, the average size of a token was 12 bytes. We now use a 40 bit hash
> of the token stored as a CHAR(5) field in the database which takes up
> much less space than a variable length char with average length 12
> bytes. The database is much smaller, access is faster, but the tradeoff
> is that we can no longer dump the Bayes token database in plain text.

Oh, I see: it was just a matter of db size and speed. I was believing there was 
some other matter, since there is no way (an option, in example) to allow 
putting tokens "in clear" when speed and size is not an issue and one likes to 
have an indepth about what's going on.

Also, I see that the BayesStore::PgSQL module uses BYTEA datatypes for tokens. 
This probably is not the most efficent way when speed is a concern: referring 
BYTEA fields needs a further lookup by the postgresql engine since they are 
meant for blobs.

Many thanks for repling to my previous post.

giampaolo


> 
>  -- sidney

Reply via email to