Hi Kevin and Apache SpamAssassin Dev Community, I have resolved all the changes you suggested in the previous draft. 1) I mentioned about learning PERL a week before the community bonding period. It will not take much time. I can assure you that language is not going to be an issue. 2) I updated the biography part a bit 3) Significant changes have been made in the Timeline. 4) I'm planning to used cmake/travis ci for automated testing. If there is a better alternative please do suggest. 5) I gave links to research papers that i will be reading in the timeline. 6) I updated the timeline by mentioning to gain advanced information about email traffic and spams. I listed some links for the purpose. 7) I updated the credits 8) There are other changes made in various parts of proposal.
Thanks for your previous detailed feedback. Here is link to the updated proposal GSoC 2018 proposal <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh> Please rigorously review it and suggest any changes that I should make. Awaiting for a favorable response. Thanks... Saahil Sirowa B. Tech Computer Science and Engineering Indian Institute of Technology, Hyderabd On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <[email protected]> wrote: > Hi Saahil > > re: Perl. As the project is primarily in Perl and you do not list that in > your Proficiencies or any similar languages like PHP, I would address > that. The word Perl does not appear a single time. > > Your Biography is a little light on why this is something you feel you can > implement. The mentors will likely NOT be able to help you with the > science rather focusing on the community, processes, and open source in > general. > > re: Email and SPam, do you have any experience with email traffic or > spam? if so, add it. If not, explain what you plan to do to address that. > > Re: Deliverables, I think you'll need to propose the first draft of that. > But your goal will likely be a plugin for Apache SpamAssassin that can be > installed and configured to provide multiple configurable statistical > analysis algorithms to better identify ham (good email) and/or spam (bad > email) > > Please use Apache SpamAssassin to properly brand the title. > > Re: I have no input on the scheduling/timelines except that past proposal > I have read have included more phases and do not add "optional" items. I'd > prefer to see small increments to make sure you stay on schedule and don't > get overwhelmed and find yourself way behind as the time progresses. > > Re: Testing Methodology, this is likely the most critical missing part. I > am a fan of test driven development where you set up tests that should pass > and fall and use continuous testing as you add code to confirm your > development is progressing well. > > This is especially important because spam analysis often doesn't work the > way people expect and tests w/statistics can help identify issues. > > For example, this is a hypothesis that this statistical algorithms will be > better than Bayes. So you'll need a baseline for comparison. > > Additionally, even experts in the field are surprised when they think > something will prove the hamminess of an email but in fact shows the > opposite. Real world example, SPF is a policy when introduced was supposed > to allow an automated mechanism that says "this is an email from a > legitimate mail server for my domain". > > However, the FIRST wave of people to adobt it were all spammers. So it > became a spam indicator more than a spam indicator. It was a very > interesting outcome. > > Re: Corpora, you'll want a corpora of carefully hand sorted ham and spam. > Have you thought about how you'll get that? I *might* be able to help but > it's 50/50. > > Re: You mention reading research papers on statisical algorithms from a > previous proposal. You'll want to list them to show which ones you plan to > study > > re: "Discussions with the SA community regarding the various types of > spams that the present SA can handle." is unclear. What is a "type of > spam" to you? Do you have a list of types of spam? > > re: "Brainstorming with the mentors and SA community about the various > input features and parameters that can have a huge impact on the overall > performance of the listed neural nets models." I think this is flawed. > There won't be a ton of people who can discuss this with you. You'll need > to likely use scientific process to show what has a performance impact. > This is not busy work or school work. This is an experiment that has not > been tried at the SA project. > > re: "actively involved with the community." is a stretch. A few emails do > not active involvement make. > > re: Bonding, you might consider raising that to 1-2 major bugs and 10-20 > minor bugs. > > Re: Credits/references, I would add more clarity about where each of those > references are used. > > Regards, > KAM >
