[
https://issues.apache.org/jira/browse/COMDEV-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Thomas updated COMDEV-260:
-------------------------------
Component/s: GSoC/Mentoring ideas
> GSOC 2018 SpamAssassin Bayes Token ID
> -------------------------------------
>
> Key: COMDEV-260
> URL: https://issues.apache.org/jira/browse/COMDEV-260
> Project: Community Development
> Issue Type: Project
> Components: GSoC/Mentoring ideas
> Reporter: Kevin A. McGrail
> Priority: Major
>
> From Diane F Skoll idea (used with permission):
> We tokenize inbound messages and store the tokens on the server. In each
> message, we add links for doing training. When you click on a training link,
> the system trains the message based on the tokens stored on the server. In
> that way, you are training using exactly the tokens that the Bayes code saw.
> For SA, the key point is a framework to store the Bayesian tokens from the
> email before delivery of the email so later, a "this is spam" "this is ham"
> mechanism can take advantage of that information without having the entire
> email.
> Adding a header with the message id for the storage of the headers allows a
> framework to be built for train as spam, train as ham to be more readily
> built.
> The issues you are pointing to have to deal more with the implementation of
> the this is spam/this is ham mechanism.
> By storing just the tokens, there is less space and privacy & legal concerns
> are mitigated.
> sa-learn would then be extended to use the message id and learn as spam/ham
> instead of feeding it the entire message.
>
>
> Apache SpamAssassin is a mail filter to identify spam. It is an intelligent
> email filter which uses a diverse range of tests to identify unsolicited bulk
> email, more commonly known as Spam. These tests are applied to email headers
> and content to classify email using advanced statistical methods.
> In addition, SpamAssassin has a modular architecture that allows other
> technologies to be quickly wielded against spam and is designed for easy
> integration into virtually any email system.
> It is primarily written in Perl with a few bits in C and shell scripts for
> system integration.
> The compendium at
> https://raptor.pccc.com/raptor.cgim?template=email_spam_compendium is helpful
> to understand some of the concepts with SpamAssassin
> It will be helpful for a student in this project to understand SMTP but a
> willingness to learn and setup your own mail server on a Linux Distribution
> with SpamAssassin for a personal test domain will be very desired with
> assistance provided to get the basic framework for a sandbox for learning.
> As email becomes more commodotized by major providers, knowledge of email
> systems and their security is dwindling. This opportunity can provide
> real-world experience with an email security product that is employed by
> countless commercial systems in the world.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]