Re: [Dspam-user] Merged Group

Steve Tue, 02 Feb 2010 09:44:21 -0800

> DSPAM Virtual UID.
> 
Yes. You should add the group into the DSPAM virtual table.



> Does this mean that I need to periodically run dspam_merge?
> 
No. You can but you don't need. I would suggest you to NOT do dspam_merge. I 
would suggest to make a honey pot and capture Spam and feed that Spam daily or 
weekly into the merged group.
However... the most important part is to not forget to feed Ham as well. That 
is very, very important.


> Lets say the merged group identifies a message to be spam, but the
> same type of message merged group member has identified as ham.  What
> happens to this type of message addressed to this member?
> 
Merged groups don't work that way. Let me explain:

Assume the merged group (using UID 2) has those tokens:
+-----+-----------------+-----------+---------------+
| uid | token           | spam_hits | innocent_hits |
+-----+-----------------+-----------+---------------+
|   2 |   1100379183237 |         2 |             0 |
|   2 |   7020201851407 |         0 |             5 |
|   2 |  16306549334969 |         0 |             2 |
|   2 |  31298043379889 |         0 |             2 |
|   2 |  44666538766313 |         5 |             0 |
|   2 |  46521964453465 |         5 |             0 |
|   2 |  48377390509657 |         5 |             0 |
|   2 |  82030331931667 |         0 |             5 |
|   2 | 109124241189538 |         2 |             0 |
|   2 | 145479465005944 |         5 |             0 |
+-----+-----------------+-----------+---------------+

And assuming your user (using UID 4 and being member of the merged group) has 
those tokens:
+-----+-------------------+-----------+---------------+
| uid | token             | spam_hits | innocent_hits |
+-----+-------------------+-----------+---------------+
|   4 |  3795419892769691 |         0 |            24 |
|   4 |  9932798052975044 |         0 |           703 |
|   4 | 13688388457136758 |         0 |             1 |
|   4 | 19098714477472823 |         0 |             1 |
|   4 | 19353278680203901 |         1 |             2 |
|   4 | 30536716472363896 |         0 |             2 |
|   4 | 34021786116482433 |         0 |             1 |
|   4 | 36934353911637225 |         0 |             6 |
|   4 | 40661178011653218 |         0 |             1 |
|   4 | 40680590081409654 |         1 |             1 |
+-----+-------------------+-----------+---------------+

Then DSPAM will merge those tokens at runtime to:
+-------------------+-----------+---------------+
| token             | spam_hits | innocent_hits |
+-------------------+-----------+---------------+
|     1100379183237 |         2 |             0 |
|     7020201851407 |         0 |             5 |
|    16306549334969 |         0 |             2 |
|    31298043379889 |         0 |             2 |
|    44666538766313 |         5 |             0 |
|    46521964453465 |         5 |             0 |
|    48377390509657 |         5 |             0 |
|    82030331931667 |         0 |             5 |
|   109124241189538 |         2 |             0 |
|   145479465005944 |         5 |             0 |
|  3795419892769691 |         0 |            24 |
|  9932798052975044 |         0 |           703 |
| 13688388457136758 |         0 |             1 |
| 19098714477472823 |         0 |             1 |
| 19353278680203901 |         1 |             2 |
| 30536716472363896 |         0 |             2 |
| 34021786116482433 |         0 |             1 |
| 36934353911637225 |         0 |             6 |
| 40661178011653218 |         0 |             1 |
| 40680590081409654 |         1 |             1 |
+-------------------+-----------+---------------+

Since in the above examples none of the tokens is found in UID 2 AND in UID 4 
the result is a concatenation of all available data.

Now if UID 2 had additionally that token:
+-----+-------------------+-----------+---------------+
| uid | token             | spam_hits | innocent_hits |
+-----+-------------------+-----------+---------------+
|   2 |    31100379183237 |        24 |             2 |
+-----+-------------------+-----------+---------------+

And UID 4 had additional that token (same as above):
+-----+-------------------+-----------+---------------+
| uid | token             | spam_hits | innocent_hits |
+-----+-------------------+-----------+---------------+
|   4 |    31100379183237 |         1 |          1045 |
+-----+-------------------+-----------+---------------+

Then the merged result for that token would be:
+-------------------+-----------+---------------+
| token             | spam_hits | innocent_hits |
+-------------------+-----------+---------------+
|    31100379183237 |        25 |          1047 |
+-------------------+-----------+---------------+

So the question is not if the merged group is seeing something as Spam or Ham, 
because the tokens of the user AND the group are MERGED and their combined 
result is responsible for classifying a message either as Ham or as Spam.

This is for MERGED GROUPS. A classification or inoculation group works 
differently.


> BTW... as per your recommendation,  sqlgrey, grossd, sid-filter,
> opedkim, policyd-weight  is phenomenal.  I can't believe how well it
> is working.
> 
So you managed to get my additional patches for Policyd-Weight to work on your 
setup? I told you that your Spam volume will go drastically down if you use 
those patches :)

And I bet you have a very low (if at all) False Positive rate regarding those 
services that block mail. Right?

Could you send me (not to the list) your GROSS settings? I think I still have 
not helped you regarding GROSS. I ask because I would strongly recommend to use 
different RBLs/RHBL/etc in GROSS then you use in Policyd-Weight.

It took me a lot of time to fine tune Policyd-Weight with those additional 
patches to get that blocking rate that you currently have. I am surprised that 
people still go the hard path with stuff like SA and use a huge memory and CPU 
eating beast instead going the easy path with a Postfix policy service. I guess 
they don't need to have such a high inbound/outbound as I do. So for them it is 
not important to have low resources usage and high throughput.

Spam fighting means for me to constantly look at new research and implement it 
when possible. So I am always reading those new researches and I am constantly 
trying out new things. My Policyd-Weight is my heuristic wall that blocks and 
DSPAM is my statistical classifier that has almost nothing to do because of the 
small, fast and lean heuristic in front of it :)


// Steve
-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Dspam-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] Merged Group

Reply via email to