Thanks Sarang. I got your email to my address as well but it's a holiday weekend for me in the states. (Happy Easter to all those who celebrate!)

It looks to me like you understand that programming is a state of mind not a language which is good and you are capable of switching gears.

I will sign up as a mentor on Monday and let you know when that is done.

From there, you can look at the SA code and answer the basic questions below because to me your proposal needs clarification to switch to this project. There is a lot of information in the proposal I don't grok so I think the basic high level questions for you are:

- What does SA have now related to your proposal?
- What you propose?  A plugin?  Multiple plugins?
- Why is this anticipated to be better than what exists now.

Regards,
KAM




On 4/4/2015 4:17 PM, Sarang Shrivastava wrote:

---------- Forwarded message ----------
From: *Sarang Shrivastava* <[email protected] <mailto:[email protected]>>
Date: Sat, Apr 4, 2015 at 11:15 AM
Subject: Re: LOOKING OUT FOR A MENTOR FOR GSOC 2015
To: "Kevin A. McGrail" <[email protected] <mailto:[email protected]>>


Hi Kevin,

Before I came in contact with Rspamd I didn't knew lua at all, but within a week I was proficient enough so that I could atleast be able to understand the part written in lua (in the rspamd source code). As you know necessity is the mother of all inventions, learning perl and redis would not be a hurdle.

I was just worried about the fact that first of all I need to look up for mentor, and now when I have one with me (hopefully you seem to be interested) , so starting from today itself I will dig more into the source code of SA and brush upon my perl and redis skills.

Regarding the dataset What I plan is :

Firstly I could directly use the famous enron dataset for spam filters :-
http://www.aueb.gr/users/ion/data/enron-spam/

Secondly one more thing can be done, I take the spam dataset from :
http://untroubled.org/spam/
which has a collection of spams from 1998-2011 and take the ham dataset from my own mail account by importing my or for the matter of fact anyones mails from the gmail server.
https://www.mattcutts.com/blog/backup-gmail-in-linux-with-getmail/

I'll set up my development environment today itself . I didn't got one of your questions "Additionally, what resources do you have to develop and test this code on ?". By this did you meant that where would I test my code, for that initially I would just work upon the test data and directly take input from the dataset in my perl script ( which I would be writing) . Or if SA has any testing framework I could use that and test my script. Or If I need to write the unit tests myself - that could be done but it would be better if there is some framework that I could use.

Just a thought,
While going through the SA source code I came across a script for that said "This is the general class used to train a learning classifier with new samples of spam and ham mail, and classify based on prior training." in its comments.
But I guess this is primarily for Bayesian filtering.
If this is the case I can design a similar script for my testing purpose.

One more thing , once I am done with the coding part , I can just put a off the filter on the other rules that SA uses to filter spams and then in turn just put on the the filter for my code. This would guarantee that everything is working fine and then I would have to focus just on improving the performance of the filtering process.

So what I plan for the upcoming week is to take a deeper look into the SA source code ( The part where Bayesian filtering is implemented ) and meanwhile learning perl and redis side by side.

What else do you want me to do ? Your suggestions are most welcome and would help me to have a better understanding about the SA project and how to get things done.

Cheers,
Sarang

On Fri, Apr 3, 2015 at 11:47 PM, Kevin A. McGrail <[email protected] <mailto:[email protected]>> wrote:

    Hi Sarang,

    I've mentored in past GSOCs so I'm interested in helping you but I
    am concerned about your proposal and the SpamAssassin project.  So
    I can't sign off on it as-is but I'd like to see if we can fix that.

    The SA project is built on plugins primarily in perl.  I didn't
    see perl or Redis in your proficiencies which I have no doubt you
    can learn but I'd like to know more about your plans with that.

    You also mentioned a data set and I'm not sure what data set you
    plan to use for testing. Additionally, what resources do you have
    to develop and test this code on?  These may be simple or
    difficult hurdles but they merit attention.

    Just replacing spamassassin where rspamd exists doesn't really
    mean the Project Proposal is ready to go because of things like
    the plugin language (not lua), etc.

    Can you look at SA and delve a bit more into the end goal with
    your proposal for SA?  I understand completely if this isn't a fit
    so don't hesitate to bow out.

    regards,
    KAM


    On 4/3/2015 1:06 PM, Sarang Shrivastava wrote:
    Hello all,

    I am Sarang Shrivastava, an open source enthusiast from  MNNIT,
    Allahabad,India.

    While applying for this year's GSOC I committed a blunder, in the
    initial phase I was interested in working with the RSPAMD
    organisation ( Basically a SPAM filter ) and was working on the
    idea of "IMPLEMENTING META-STATISTIC ALGORITHMS".
    But while submitting the proposal I accidentally submitted it
    with the Apache software foundation.

    I asked the mentors of both Rspamd and Apache to somehow transfer
    my proposal to Rspamd but this can't happen now.

    The thing is my proposal is not organisation specific.Any open
    source spam filtering project that does not has this idea can
    take the advantage of it.I went through the Spamassasin wiki page
    and found out that it only has Bayesian filtering as statistical
    classification technique, but the other machine learning methods
    that I have listed in my proposal could surprisingly increase
    the efficiently of the spam filtering process.

    So, it would really be appreciating if anyone could mentor me
    throughout the GSOC period. I want to work on this proposal but
    unless an until anyone of you signs up as a mentor and accept my
    proposal in Melange before 12th of April I cannot work on it further.

    Please I kindly request if anyone among you who is interested in
    my idea , please be my mentor. I am sure that given a chance to
    prove myself, I would not disappoint you.

    The link to my proposal is
    
:https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2015/xlr_24/5629499534213120

    I have also enclosed a copy of my proposal as an attachment.
    PS: In my attached proposal wherever I wrote rspamd , I have
    replaced it with Spamassasin.

    Cheers,
    Sarang



-- *Sarang Shrivastava*
    *Computer Science & Engineering*
    *MNNIT Allahabad*





--
*Sarang Shrivastava*
*Computer Science & Engineering*
*MNNIT Allahabad*



--
*Sarang Shrivastava*
*Computer Science & Engineering*
*MNNIT Allahabad*


--
*Kevin A. McGrail*
President

Peregrine Computer Consultants Corporation
3927 Old Lee Highway, Suite 102-C
Fairfax, VA 22030-2422

http://www.pccc.com/

703-359-9700 x50 / 800-823-8402 (Toll-Free)
703-798-0171 (wireless)
[email protected] <mailto:[email protected]>

Reply via email to