-------- Original-Nachricht --------
> Datum: Thu, 12 Nov 2009 20:26:20 +0100
> Von: coma <[email protected]>
> An: [email protected]
> Betreff: Re: [Dspam-user] Dspam MySQL database

> >
> > What? What case?
> >
> > dspam_token_data is the table holding the tokens with their spam/ham hit
> > counter.
> >
> > dspam_signature_data is the table holding a degenerated message for a
> > specific signature.
> >
> >
> > // Steve
> >
> >
> 
> 
> Thank you Steve for your reply and sorry for my bad English.
> 
> I wanted to know what is stored in the column token of the table
> dspam_token_data and what is stored in the column data of the table
> dspam_signature_data.
> 
> I think the column token contains ID number of the tokens generated for
> each
> word by dspam, and they can be shared by several users that are received
> or
> forward the same mail or a mail with same word(s).
>
No. That is not true. First the structure for dspam_token_data:
+---------------+---------------------+
| Field         | Type                |
+---------------+---------------------+
| uid           | int(10) unsigned    |
| token         | bigint(20) unsigned |
| spam_hits     | bigint(20) unsigned |
| innocent_hits | bigint(20) unsigned |
| last_hit      | date                |
+---------------+---------------------+

Tokens are indeed saved in "token" and spam hits (of that token) are saved in 
"spam_hits" and innocent hits (of that token) are saved in innocent_hits. But 
the token is NOT saved for every one. The uid field is responsible for that. So 
you and I might have the same exactly token in the database but I might have 
100 innocent hits and 5 spam hits while you might have for the exactly same 
token 0 innocent hits and 47 spam hits.

The only way to allow sharing of tokens is to use groups in DSPAM. Read the 
documentation if you need to know more about it.





> And i think the column data contains a hash of all the tokens generated by
> dspam for one mail (one hash corresponding to one signature).
> 
This is as well not 100% correct. Again... here the structure of the table:
+------------+------------------+
| Field      | Type             |
+------------+------------------+
| uid        | int(10) unsigned |
| signature  | varchar(32)      |
| data       | longblob         |
| length     | int(10) unsigned |
| created_on | date             |
+------------+------------------+

The data filed contains a binary representation of the message. The message in 
the field data should not have any HTML tags, should have stripped out all 
unnecessarily characters, etc...

This data is then used when you tell DSPAM to retrain a message. Then DSPAM 
goes and reads that binary data and then tokenizes the message into tokens. Not 
before. And then relearns the message to whatever class you have told it to be 
relearned.

Depending on what learning method you have used DSPAM the resulting tokens will 
be added/updated in dspam_token_data.



> When a mail is forwarded, dspam retrieves signature and "reverse" the
> tokens
> in "spam or notspam" depending of the nature of the forward, and checking
> the ID of the user to reverse this just for him.
> 
> I have good or am I completely mistaken?
> 
> Thank you very much,
> 
> coma
>
Steve
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to