Re: BayesStore::SQL question

Michael Parker Wed, 13 Dec 2006 09:48:17 -0800

Giampaolo Tomassoni wrote:
> Dears,
> 
> actually, I see the Bayes database in SA can be either per-user or
> system-wide.
> 
> I would like to have a way to put bayes tokens on a per-user basis,
> and fetch them on a more system-wide (or pheraps domain-wide) way.
>


Without going much further, you can fake the domain level bayes if you
want.  You just have to use bayes_override_username in domain based SQL
user_prefs.

> My intention is to have each user's bayes to contribute to scoring
> every other user's incoming mail, while still let each user's db be
> prominent in scoring mails delivered to the user's mailbox.
> 
> To accomplish this, I would probably need to write my own version of
> the BayesStore::(My|Pg)SQL, but I'm facing a problem: how can I get
> the message's destinators from a subclass of BayesStore::SQL? I see a
> $self::_userid defined, but it seems to be meant to store the
> username used to access the db (and it is a scalar, not an array or
> something like that which may be needed if the message is targeted to
> multiple destinators).

Its not the username to connect to the database, its the username that
will be used for all lookups in the database.  Its private and
calculated based on the username variable in the main SpamAssassin object.

What you're asking for is data that is not user specific.  You would
have to obtain your data elsewhere.  SpamAssassin has no knowledge who
the recipients of a message are.  Best you might could do is parse the
message itself looking at To/CC but we all know won't really work.

The method that you want to gather data is really a departure from how
things are done now, you would most likely have to throw out just about
everything and replace it with a modified version.

> 
> Also, I have a question which is loosely related to this. Why tokens
> get hashed before storing to/retrieving by the db? Wasn't it better
> to have them in clear, just to allow, in example, an easy
> identification of the "really spam words" which could be used to
> build rules further penalizing spam messages?

Sidney touched briefly on this.  I'll add a little more.  Indeed its all
about speed improvements and performance.

Lots of discussion happened around this when I made the change.  I tried
to keep the option of allowing clear text as well but, and my memory
might be failing me here, it was about a 12% drop in performance
allowing that option.  The compromise was several plugin hooks that let
you build a separate database of the clear text tokens.  I believe I
posted a proof of concept plugin at the time to show that it would work.

Also, in PostgreSQL, the column is BYTEA because it is binary data and
otherwise you get token corruption.

Hope that helps.

Michael




> 
> Thanks,
> 
> ----------------------------------- Giampaolo Tomassoni - IT
> Consultant Piazza VIII Aprile 1948, 4 I-53044 Chiusi (SI) - Italy Ph:
> +39-0578-21100
> 
> MAI inviare una e-mail a: NEVER send an e-mail to: 
> [EMAIL PROTECTED]
>

Re: BayesStore::SQL question

Reply via email to