Hi Tom,
Thanks for your answer, well yes indeeed I do receive ham and spam but my goal
here would be to try and build up a good starting global corpus to avoid as
much as possible spam since the beginning as I will be migrating quite a lot of
domains and mail account to this new mail server using dspam. I would like to
avoid the situation where user accounts gets migrated and as soon as they are
on the new server they receive a lot of spam messages until they train their
account themselves by marking the message as spam.
The problem is that people are lazy and sometime won't even bother to move
their spam mails into the spam folder or send it to a special spam@ email
address. So I am afraid that users will not train dspam and they will keep on
getting a lot of spam.
Now this makes me think, can dspam somehow autotrain? I mean get better with
spam detection with the time without having an end user having to report the FN
and FP to dspam? Or am I just dreaming of science fiction stuff here ;-)
Cheers
ML
On Wednesday, November 13, 2013 9:22 PM, Tom Hendrikx <t...@whyscream.net>
wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
On 13-11-13 20:49, ML mail wrote:
> Hello,
>
> I would like to know what you guys recommend as best method for
> training a global user which will then be used for every accounts
> as a starting base. I have defined my global user as such in the
> group file:
>
> Unfortunately I have the impression that my globaluser is not well
> trained as still more than 50% of the spam is being seen as
> innocent. I somehow suspect this having to do with the fact that
> the spam and ham mails from spamassassin are very old (10 years).
>
>
> How would you train a global user? and with which data? is there
> any public spam/ham data somewhere which can be used?
>
You are receiving spam and ham, right? Use that for training, a public
ham corpus doesn't look in any way like your regular ham, and a public
spam corpus from 10 years ago doesn't resemble actual recent spam
messages.
If using (ancient) public corpi was a good solution to spam filtering,
we could drop heuristic engines and just release a binary database dump
every month based on public corpi :)
Regards,
Tom
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQIcBAEBCAAGBQJSg937AAoJEJPfMZ19VO/1cjsP/2bSjvBNrggGcmZQG8+PZfI+
QAHW4dO8gOWBHp6jYVsz2DkvP6b18dR4vzseFRyWbVi/4BU/8Yq5FNVfTb5xc78S
/Ih5fx9hlH9c1DOST/SWQS+vDr5TmML/mOOtF8/ZusaT/CkMwhMEGM7m4yFqXNwa
HPHNf8RaBgXS+pkPBhjWgL0YwDX8QXwx4Per1jrorIaqcZySDSsbp1rINGRq+TYf
ZM9qtTD8bpkP71KnF1KCY1mpjwGdASenspOkJENX7M49nwDG28lejvNLGzC2Argw
GUPp1BgCB81hey/DpTq0vz/S9RxM6YkoW8IgVtPHNvTeYfnh9hSQDqSj8/816/9r
egaIvvDhzTC9DL4hIgHqNbmUyu5LC9uTjbLmsbXFzCH8Snr2y/rwDrBkxlOdMVtB
HHOwcRXqutS6wO2Jbq5zIIGSvxXJTI3CZoQVEvGl3ub3n9MqvlKZ4ZdhIjnBUGnv
wjkD6Zqn9EJa/E3PK/tyFZB/ZlsmYuKtqP+jX5TIpT8mk5yh6sO4vgW7akiGXnMM
RRzTDa9wKggL0lIbYHu68bdZwelAm1s/ucTbWdXZuPFtF1MAL6Lis4t1CVxVJReM
GTz8xR+Fo3AB0ON6pH4neRJFt3VTcbITqXIgcA3HQlB5SjOcmpLowbY6DdqOgtDi
rcrWLneyC0jMleH/O5JO
=BMg/
-----END PGP SIGNATURE-----
------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user
------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user