RE: BayesStore::SQL question

Giampaolo Tomassoni Wed, 13 Dec 2006 15:17:24 -0800

From: Michael Parker [mailto:[EMAIL PROTECTED]
> 
> Giampaolo Tomassoni wrote:
> > Dears,
> > 
> > actually, I see the Bayes database in SA can be either per-user or
> > system-wide.
> > 
> > I would like to have a way to put bayes tokens on a per-user basis,
> > and fetch them on a more system-wide (or pheraps domain-wide) way.
> > 
> 
> Without going much further, you can fake the domain level bayes if you
> want.  You just have to use bayes_override_username in domain based SQL
> user_prefs.
> 
> > My intention is to have each user's bayes to contribute to scoring
> > every other user's incoming mail, while still let each user's db be
> > prominent in scoring mails delivered to the user's mailbox.
> > 
> > To accomplish this, I would probably need to write my own version of
> > the BayesStore::(My|Pg)SQL, but I'm facing a problem: how can I get
> > the message's destinators from a subclass of BayesStore::SQL? I see a
> > $self::_userid defined, but it seems to be meant to store the
> > username used to access the db (and it is a scalar, not an array or
> > something like that which may be needed if the message is targeted to
> > multiple destinators).
> 
> Its not the username to connect to the database, its the username that
> will be used for all lookups in the database.  Its private and
> calculated based on the username variable in the main SpamAssassin object.
> 
> What you're asking for is data that is not user specific.  You would
> have to obtain your data elsewhere.  SpamAssassin has no knowledge who
> the recipients of a message are.  Best you might could do is parse the
> message itself looking at To/CC but we all know won't really work.
> 
> The method that you want to gather data is really a departure from how
> things are done now, you would most likely have to throw out just about
> everything and replace it with a modified version.


Ok. Maybe I'm missing something in the "design overview" of SA which misleaded 
me to ask a wrong question.

I'm using SA with amavisd-new, and I have my MTA (postfix) configured such that 
an inbound mail gets split into one copy for each destination mailbox before 
feeding it to amavisd and, thereby, to SA. This was meant to allow for 
per-server + per-domain + per-user SA settings (through a SQL view prioritizing 
and merging them).

So, thank to your explanation of the $_self->_userid semantic and since I see 
that per-user, ecc.ecc. setting do actually work, I guess that the config I'm 
actually running should set $_self->_userid to the real destinating user, right?


> > Also, I have a question which is loosely related to this. Why tokens
> > get hashed before storing to/retrieving by the db? Wasn't it better
> > to have them in clear, just to allow, in example, an easy
> > identification of the "really spam words" which could be used to
> > build rules further penalizing spam messages?
> 
> Sidney touched briefly on this.  I'll add a little more.  Indeed its all
> about speed improvements and performance.
> 
> Lots of discussion happened around this when I made the change.  I tried
> to keep the option of allowing clear text as well but, and my memory
> might be failing me here, it was about a 12% drop in performance
> allowing that option.  The compromise was several plugin hooks that let
> you build a separate database of the clear text tokens.  I believe I
> posted a proof of concept plugin at the time to show that it would work.

You mean a separate db in a DBM-based bayes, right? That's to say a separate 
(lookup) table in a SQL DB.

Thank you so much for your reply. I could have some more questions when I'l 
start developing my version of BayesStore::PgSQL.

giampaolo

> Also, in PostgreSQL, the column is BYTEA because it is binary data and
> otherwise you get token corruption.
> 
> Hope that helps.
> 
> Michael
> 
> 
> 
> 
> > 
> > Thanks,
> > 
> > ----------------------------------- Giampaolo Tomassoni - IT
> > Consultant Piazza VIII Aprile 1948, 4 I-53044 Chiusi (SI) - Italy Ph:
> > +39-0578-21100
> > 
> > MAI inviare una e-mail a: NEVER send an e-mail to: 
> > [EMAIL PROTECTED]
> > 
>

RE: BayesStore::SQL question

Reply via email to