As you suggest I will train the globaluser with actual and real spam+ham taken 
randomly from various mail accounts on the old mail server. What would you say 
is a the minimum amount of spam+ham mails for training the globaluser with good 
results? 2000? more?

Actually on the new server I already installed the antispam plugin with Dovecot 
and it is really a nice one as I do not want to use the more annoying spam-* 
non-spam-* e-mail aliases. The problem here is the people who use POP3 :(

The global user I am speaking about here is actually the global group 
(CLASSIFICATION group definition as explained in chapter 2.1 of DSPAM's README 
file) which each mail account should benefit automatically before they have 
been trained themselves enough. So I really want to train this global group as 
best as possible to avoid a maximum of spam already at the beginning.

By the way speaking of groups, would you then rather recommend me using shared 
groups for each domains and have all users of a domain in one shared group 
(*@domain.com)? I have a typical case of hosting various domains e-mail 
accounts.

Thanks agains for explaining me the inner workings of DSPAM, I get it that it 
needs to be thaught, and the more the better :)

Last question, reading the DSPAM README file, I get the impression that you 
would  recommend the hash driver for better performance rather than using a 
RDBMS am I correct? I am using PostgreSQL right now but I was thinking if I can 
get even better performance using the hash driver than I will change to that. I 
don't really see the advantage of using a database yet, especially for one 
single mail server.


Cheers,
ML




On Wednesday, November 13, 2013 10:57 PM, Tom Hendrikx <t...@whyscream.net> 
wrote:
 
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 13-11-13 21:58, ML mail wrote:
> Hi Tom,
> 
> Thanks for your answer, well yes indeeed I do receive ham and spam 
> but my goal here would be to try and build up a good starting
> global corpus to avoid as much as possible spam since the beginning
> as I will be migrating quite a lot of domains and mail account to
> this new mail server using dspam. I would like to avoid the
> situation where user accounts gets migrated and as soon as they are
> on the new server they receive a lot of spam messages until they
> train their account themselves by marking the message as spam.

So collect the spam corpus from existing accounts that you trust, for
instance your own. The last half year of spam in your junk folder
should be a lot better than the ancient SA corpus, even when it
contains only a few hunderd messages.

You could use Sent folder contents as ham (most IMAP users have their
outgoing mail on the server).

use dspam_train on the collected mail, then migrate some accounts that
you own to new machine. See if the filtering is ok, else train some
more using a feedback loop that uses the messages that people do sent
to designated mail addresses, or by hooking up re-training directly in
the imap server [1].

[1] http://wiki2.dovecot.org/Plugins/Antispam

> 
> The problem is that people are lazy and sometime won't even bother
> to move their spam mails into the spam folder or send it to a
> special spam@ email address. So I am afraid that users will not
> train dspam and they will keep on getting a lot of spam.

As long as you have some users doing that, and they are working on a
global user (and mail contents from users don't differ too much), you
should be fine.

> 
> Now this makes me think, can dspam somehow autotrain? I mean get 
> better with spam detection with the time without having an end
> user having to report the FN and FP to dspam? Or am I just dreaming
> of science fiction stuff here ;-)

Dspam is a heuristic system, which means that it doesn't know
anything, but will learn from a teacher. The teacher should know what
to learn (i.e. what is spam and what is ham).
You (and the retraining users) are the teacher(s). If the teacher
doesn't tell the student that he learnt a wrong thing, how will the
student correct its work and actually get better? The above
suggestions are only tools for teachers, to minimize the effort for
doing the required teaching work. :)

FWIW: I use the dovecot-antispam to retrain ham and spam, so literally
all I have to do to train a FP is 'move ham mail from Spam to Inbox
folder in imap client', and vice versa for FN.


> 
> Cheers ML
> 
> 
> 
> 
> On Wednesday, November 13, 2013 9:22 PM, Tom Hendrikx 
> <t...@whyscream.net> wrote:
> 
> 
> On 13-11-13 20:49, ML mail wrote:
>> Hello,
> 
>> I would like to know what you guys recommend as best method for 
>> training a global user which will then be used for every accounts
>>  as a starting base. I have defined my global user as such in the
>>  group file:
> 
>> Unfortunately I have the impression that my globaluser is not
>> well trained as still more than 50% of the spam is being seen as
>>  innocent. I somehow suspect this having to do with the fact that
>>  the spam and ham mails from spamassassin are very old (10
>> years).
> 
> 
>> How would you train a global user? and with which data? is there
>>  any public spam/ham data somewhere which can be used?
> 
> 
> You are receiving spam and ham, right? Use that for training, a 
> public ham corpus doesn't look in any way like your regular ham,
> and a public spam corpus from 10 years ago doesn't resemble actual
> recent spam messages.
> 
> If using (ancient) public corpi was a good solution to spam 
> filtering, we could drop heuristic engines and just release a
> binary database dump every month based on public corpi :)
> 
> Regards, Tom
> 
> ------------------------------------------------------------------------------
>
>
> 
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
> OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API 
> Access Free app hosting. Or install the open source package on any 
> LAMP server. Sign up and see examples for AngularJS, jQuery,
> Sencha Touch and Native! 
> http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
>
>
> 
_______________________________________________
> Dspam-user mailing list Dspam-user@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/dspam-user
> 
> 
> 
> ------------------------------------------------------------------------------
>
>
> 
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
> OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API 
> Access Free app hosting. Or install the open source package on any 
> LAMP server. Sign up and see examples for AngularJS, jQuery,
> Sencha Touch and Native! 
> http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
>
>
> 
> 
> 
> _______________________________________________ Dspam-user mailing 
> list Dspam-user@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/dspam-user
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJSg/QiAAoJEJPfMZ19VO/14+gQAJjMqncUGBkQt1bgbzLOO5OS
kx4ZvYRBbjuaX7HJtya4Xz1aNHEYGxXZmsA91sQk+nMf80ULitZ2UjCm5zSqtmWm
ut0ZUlcIsL/aZVToDHbpt4BF0eNypltOGDasdmxqeX/7j4SDIG3dDjqRR5JhW3ni
ZJSZE/VFfuHU5Z4krzbZzPQgUymQMs9mDsCenRXJi0f1X8odNLGo3iYuKTdaN2du
+k9pkCeyJ1FTKQXeJ4CSAUUTOQp0YoPPtftOaWrCm7yzguC6S6hto94MC0mRWpxO
XFia8oiqCff5YZb5ggaZdsXJw6a0qyEZxu4r2zT58gfUlt4bY63dGuVQlVuIW1LK
w1sGKh2Cp+fJFZQK1ra4U2j9xjs8bCm5Z+xRZ176STOcjuKNbvyXdKw/z/8roiNF
fMasnj9F+gFKexaqKOKGpkg8Lc2jEWO9b1nX72QoefrdD91iFRz4Q7EM10Wp7Cot
7ZMKM70Tz+kXgtSsEilbyHYzyPq9ULRMUznQZUSgk1grM8oYaHl9mNUXPb6ujJ/3
y8+0sS8mVNovRjIWW2jTZZKpgSpkYyBx8dn8/Lw3qJSYzzNs+Jcs3XiwHp9wslpm
baEK3EvU74tMAeU6pSLw2pi9FmWnWarx2zMJPwsEpz9T0I6cFworYJAd5R31CrZQ
1h0/tuly60YyXqAZPZSW
=F6PM

-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user
------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to