Feature Requests item #2939578, was opened at 2010-01-25 18:45
Message generated for change (Comment added) made by sbajic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1126468&aid=2939578&group_id=250683

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Priority: 5
Private: No
Submitted By: Enrico Scholz (ensc)
Assigned to: Nobody/Anonymous (nobody)
Summary: Create an unique but determined signature

Initial Comment:
I am going to use 'dspam' from within a milter where a message is often checked 
for multiple recipients.  Currently, every recipient will get another signature 
and  message must be cloned for every recipient to add the corresponding 
signature.  Things are even worse when two recipients are resolved to the same 
user by external lookup.  Then, message is processed twice by 'dspam' but user 
is told only one signature.

Hence, it would be nice when a message creates always the same signature. 
Returning a SHA1 based HMAC code (see RFC 2104)  of the message and a 
configurable secret key would be perfect.

----------------------------------------------------------------------

>Comment By: Stevan Bajic (sbajic)
Date: 2010-02-20 12:35

Message:
> All local recipients which were given as RCPT: will get exactly the
> same e-mail (header + body). Per-user signatures violate this.
>
That is not always the case. I for example have always a "Delivered-To"
header in all mail that I get and this is for not the same for every
recipient of a mail.

> Milter do not deliver mails but process mails while the MTA receives
> them (e.g. 'dspam' within the milter classifies mail before MTA gives
> the final response to DATA ('220 OK' or reject due to spamminess)).
>
Okay

> afair, one of 'dspam' basic ideas is that spam filtering should be
> applied per user.
>
Not much things in DSPAM are a must. You can but you don't need to.

> afaik, 'dspam' stores the set of tokens within an e-mail at a place
> which is associated with the signature.
>
AND an DSPAM user ID.

> This set of tokens depends
> only on the e-mail but not the recipients, doesn't it?
>
What do you mean with that? I don't understand. Can you rephrase this?

> E.g. the set of tokens for the e-mail sent as
> | MAIL FROM <postmas...@example.com>
> | 220 OK
> | RCPT TO: <f...@example.com>
> | 220 OK
> | RCPT TO: <b...@example.com>
> | 220 OK
> | DATA
> | Subject: ...
> |
> | Some message
> | .
> | 220 OK
> 
> will be the same for 'f...@example.com' and for 'b...@example.com'.
>
No. It will not be the same. The reason why it might be different is the
whitelisting feature of DSPAM. The bigger part of the tokens will be the
same but whitelisting can result in a bunch of tokens being diferent for
foo then for bar.

> Each element of this set of tokens will be inserted into a user
> specific database and spam/innocent counters be incremented.
> 
Definitely not. Assume the mail is HAM and assume that you run something
else then TEFT and assume that the mail for foo is correctly classified as
HAM and assume that the mail is classified as SPAM for bar and assume that
foo does not retrain the message as SPAM and assume that bar is retraining
the message as HAM then only the tokens for bar will be modified. For foo
nothing changes in his token set.

> For retraining, the signature is used to lookup the set of tokens and
> the counters in the user database will be reverted/corrected.
> 
This is not true.

1) One could run DSPAM in pristine mode then the tokens are not saved in
dspam_signature_data (assuming you use a SQL based backend in DSPAM).

2) Assume you don't run pristine mode then the degenerated mail can be
found in dspam_signature_data. This does not need to be necessarily whole
mail. It could easy be that you have set your database to only allow 4MB of
data in dspam_signature_data and assume the whole mail was 8MB then when
you retrain DSPAM is going to read the degenerated mail from
dspam_signature_data (but only the first 4MB) and then it is using that
data and TOKENIZING it and those tokens are then switched/added in
dspam_token_data.

> Hence, there are two datasets: the tokens which are common for all
> recipients and the classification of the tokens which is user specific.
> 
This is not 100% true. You forget pristine mode. And since this is an
option you can turn on/off on a per user basis (if you use preference
extension) you can't say with 100% sureness (from outside DSPAM) that user
foo AND bar will have their (common) dataset in dspam_signature_data.

----------------------------------------------------------------------

Comment By: Enrico Scholz (ensc)
Date: 2010-02-19 21:07

Message:
> Why can a DSPAM signature (as it is today) not be added into the
headers?

All local recipients which were given as RCPT: will get exactly the
same e-mail (header + body).  Per-user signatures violate this.


> then deliver from your Milter to each user

Milter do not deliver mails but process mails while the MTA receives
them (e.g. 'dspam' within the milter classifies mail before MTA gives
the final response to DATA ('220 OK' or reject due to spamminess)).


> set your training alias to be executed under the DSPAM user you
> used when classifying/processing the message. Or use something like
> shared groups in DSPAM.

afair, one of 'dspam' basic ideas is that spam filtering should be
applied per user.


> If I understand you right then your goal is to have just one signature
> per mail and you don't care if inside the DSPAM database the data is
> saved multiple times (for each user once) as long as the signature
> stays the same. Right?

afaik, 'dspam' stores the set of tokens within an e-mail at a place
which is associated with the signature.  This set of tokens depends
only on the e-mail but not the recipients, doesn't it?

E.g. the set of tokens for the e-mail sent as

| MAIL FROM <postmas...@example.com>
| 220 OK
| RCPT TO: <f...@example.com>
| 220 OK
| RCPT TO: <b...@example.com>
| 220 OK
| DATA
| Subject: ...
|
| Some message
| .
| 220 OK

will be the same for 'f...@example.com' and for 'b...@example.com'.


Each element of this set of tokens will be inserted into a user
specific database and spam/innocent counters be incremented.

For retraining, the signature is used to lookup the set of tokens and
the counters in the user database will be reverted/corrected.

Hence, there are two datasets: the tokens which are common for all
recipients and the classification of the tokens which is user specific.


----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2010-02-19 20:20

Message:
Why can a DSPAM signature (as it is today) not be added into the headers?
Each user has normally in DSPAM his on storage and training/retraining with
a signature is going to switch tokens for the user.

If you want the signature to stay persistent per message then just use one
DSPAM user to classify/process the message and then deliver from your
Milter to each user, adding the same DSPAM signature to the header and set
your training alias to be executed under the DSPAM user you used when
classifying/processing the message. Or use something like shared groups in
DSPAM.

If I understand you right then your goal is to have just one signature per
mail and you don't care if inside the DSPAM database the data is saved
multiple times (for each user once) as long as the signature stays the
same. Right? Adding something like that could be possible but stuff like
UID in signature would then not work.

----------------------------------------------------------------------

Comment By: Enrico Scholz (ensc)
Date: 2010-02-19 19:20

Message:
I want that users can force dspam to relearn a message.  Relearning
requires knowledge about the signature but because the signature is
different for every recipient it can not be added to the e-mail headers.
Hence, there is no way how users can relearn a message.

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2010-02-19 18:40

Message:
Hallo Enrico,

what issue are you expecting to solve with one unique signature per
message? The current database schema can not attach multiple UID's to one
signature.

-- 
Kind Regards from Switzerland,

Stevan Bajić

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1126468&aid=2939578&group_id=250683

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to