Hi Saahil re: Perl. As the project is primarily in Perl and you do not list that in your Proficiencies or any similar languages like PHP, I would address that. The word Perl does not appear a single time.
Your Biography is a little light on why this is something you feel you can implement. The mentors will likely NOT be able to help you with the science rather focusing on the community, processes, and open source in general. re: Email and SPam, do you have any experience with email traffic or spam? if so, add it. If not, explain what you plan to do to address that. Re: Deliverables, I think you'll need to propose the first draft of that. But your goal will likely be a plugin for Apache SpamAssassin that can be installed and configured to provide multiple configurable statistical analysis algorithms to better identify ham (good email) and/or spam (bad email) Please use Apache SpamAssassin to properly brand the title. Re: I have no input on the scheduling/timelines except that past proposal I have read have included more phases and do not add "optional" items. I'd prefer to see small increments to make sure you stay on schedule and don't get overwhelmed and find yourself way behind as the time progresses. Re: Testing Methodology, this is likely the most critical missing part. I am a fan of test driven development where you set up tests that should pass and fall and use continuous testing as you add code to confirm your development is progressing well. This is especially important because spam analysis often doesn't work the way people expect and tests w/statistics can help identify issues. For example, this is a hypothesis that this statistical algorithms will be better than Bayes. So you'll need a baseline for comparison. Additionally, even experts in the field are surprised when they think something will prove the hamminess of an email but in fact shows the opposite. Real world example, SPF is a policy when introduced was supposed to allow an automated mechanism that says "this is an email from a legitimate mail server for my domain". However, the FIRST wave of people to adobt it were all spammers. So it became a spam indicator more than a spam indicator. It was a very interesting outcome. Re: Corpora, you'll want a corpora of carefully hand sorted ham and spam. Have you thought about how you'll get that? I *might* be able to help but it's 50/50. Re: You mention reading research papers on statisical algorithms from a previous proposal. You'll want to list them to show which ones you plan to study re: "Discussions with the SA community regarding the various types of spams that the present SA can handle." is unclear. What is a "type of spam" to you? Do you have a list of types of spam? re: "Brainstorming with the mentors and SA community about the various input features and parameters that can have a huge impact on the overall performance of the listed neural nets models." I think this is flawed. There won't be a ton of people who can discuss this with you. You'll need to likely use scientific process to show what has a performance impact. This is not busy work or school work. This is an experiment that has not been tried at the SA project. re: "actively involved with the community." is a stretch. A few emails do not active involvement make. re: Bonding, you might consider raising that to 1-2 major bugs and 10-20 minor bugs. Re: Credits/references, I would add more clarity about where each of those references are used. Regards, KAM
