Wanted to check in and see how you are doing. THis blog post has gotten some praise
https://medium.com/@owtf/google-summer-of-code-writing-a-good-proposal-141b1376f076 . -- Kevin A. McGrail Asst. Treasurer & VP Fundraising, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171 On Wed, Mar 21, 2018 at 7:52 AM, Kevin A. McGrail <[email protected]> wrote: > Comments allowed might be helpful though :-) > > -- > Kevin A. McGrail > Asst. Treasurer & VP Fundraising, Apache Software Foundation > Chair Emeritus Apache SpamAssassin Project > https://www.linkedin.com/in/kmcgrail - 703.798.0171 <(703)%20798-0171> > > On Wed, Mar 21, 2018 at 12:36 AM, Rajkiran Rajkumar < > [email protected]> wrote: > >> @Saahil, kindly make your doc view-only for people with a link to it. >> Giving edit permissions to the world is a bad idea. >> >> Thanks, >> Rajkiran >> >> On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <[email protected]> >> wrote: >> >>> +users >>> >>> All we give is feedback. The submission to GSoC is what matters. So if >>> you mentioned perl here that's not going to carryover to the reviewers. >>> >>> Can someone with fresh eyes take a look at this? I read it too recently >>> so I will gloss over it too much. >>> >>> Here are some posts the mentors list thought might be helpful. The >>> first I believe covers someone's pov who did not get selected. >>> >>> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-rea >>> l-life-experience-and-support-open-source-b1e6a664f6e4?sourc >>> e=linkShare-53ba2bb84284-1521381334 >>> >>> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/ >>> >>> Regards, KAM >>> >>> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <[email protected]> >>> wrote: >>> >>>> Hi Kevin and Apache SpamAssassin Dev Community, >>>> >>>> I have resolved all the changes you suggested in the previous draft. >>>> 1) I mentioned about learning PERL a week before the community bonding >>>> period. It will not take much time. I can assure you that language is not >>>> going to be an issue. >>>> 2) I updated the biography part a bit >>>> 3) Significant changes have been made in the Timeline. >>>> 4) I'm planning to used cmake/travis ci for automated testing. If there >>>> is a better alternative please do suggest. >>>> 5) I gave links to research papers that i will be reading in the >>>> timeline. >>>> 6) I updated the timeline by mentioning to gain advanced information >>>> about email traffic and spams. I listed some links for the purpose. >>>> 7) I updated the credits >>>> 8) There are other changes made in various parts of proposal. >>>> >>>> Thanks for your previous detailed feedback. >>>> >>>> Here is link to the updated proposal >>>> GSoC 2018 proposal >>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh> >>>> Please rigorously review it and suggest any changes that I should make. >>>> >>>> Awaiting for a favorable response. >>>> >>>> >>>> Thanks... >>>> Saahil Sirowa >>>> B. Tech Computer Science and Engineering >>>> Indian Institute of Technology, Hyderabd >>>> >>>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <[email protected]> >>>> wrote: >>>> >>>>> Hi Saahil >>>>> >>>>> re: Perl. As the project is primarily in Perl and you do not list that >>>>> in your Proficiencies or any similar languages like PHP, I would address >>>>> that. The word Perl does not appear a single time. >>>>> >>>>> Your Biography is a little light on why this is something you feel you >>>>> can implement. The mentors will likely NOT be able to help you with the >>>>> science rather focusing on the community, processes, and open source in >>>>> general. >>>>> >>>>> re: Email and SPam, do you have any experience with email traffic or >>>>> spam? if so, add it. If not, explain what you plan to do to address >>>>> that. >>>>> >>>>> Re: Deliverables, I think you'll need to propose the first draft of >>>>> that. But your goal will likely be a plugin for Apache SpamAssassin that >>>>> can be installed and configured to provide multiple configurable >>>>> statistical analysis algorithms to better identify ham (good email) and/or >>>>> spam (bad email) >>>>> >>>>> Please use Apache SpamAssassin to properly brand the title. >>>>> >>>>> Re: I have no input on the scheduling/timelines except that past >>>>> proposal I have read have included more phases and do not add "optional" >>>>> items. I'd prefer to see small increments to make sure you stay on >>>>> schedule and don't get overwhelmed and find yourself way behind as the >>>>> time >>>>> progresses. >>>>> >>>>> Re: Testing Methodology, this is likely the most critical missing >>>>> part. I am a fan of test driven development where you set up tests that >>>>> should pass and fall and use continuous testing as you add code to confirm >>>>> your development is progressing well. >>>>> >>>>> This is especially important because spam analysis often doesn't work >>>>> the way people expect and tests w/statistics can help identify issues. >>>>> >>>>> For example, this is a hypothesis that this statistical algorithms >>>>> will be better than Bayes. So you'll need a baseline for comparison. >>>>> >>>>> Additionally, even experts in the field are surprised when they think >>>>> something will prove the hamminess of an email but in fact shows the >>>>> opposite. Real world example, SPF is a policy when introduced was >>>>> supposed >>>>> to allow an automated mechanism that says "this is an email from a >>>>> legitimate mail server for my domain". >>>>> >>>>> However, the FIRST wave of people to adobt it were all spammers. So >>>>> it became a spam indicator more than a spam indicator. It was a very >>>>> interesting outcome. >>>>> >>>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and >>>>> spam. Have you thought about how you'll get that? I *might* be able to >>>>> help but it's 50/50. >>>>> >>>>> Re: You mention reading research papers on statisical algorithms from >>>>> a previous proposal. You'll want to list them to show which ones you plan >>>>> to study >>>>> >>>>> re: "Discussions with the SA community regarding the various types of >>>>> spams that the present SA can handle." is unclear. What is a "type of >>>>> spam" to you? Do you have a list of types of spam? >>>>> >>>>> re: "Brainstorming with the mentors and SA community about the various >>>>> input features and parameters that can have a huge impact on the overall >>>>> performance of the listed neural nets models." I think this is flawed. >>>>> There won't be a ton of people who can discuss this with you. You'll need >>>>> to likely use scientific process to show what has a performance impact. >>>>> This is not busy work or school work. This is an experiment that has not >>>>> been tried at the SA project. >>>>> >>>>> re: "actively involved with the community." is a stretch. A few >>>>> emails do not active involvement make. >>>>> >>>>> re: Bonding, you might consider raising that to 1-2 major bugs and >>>>> 10-20 minor bugs. >>>>> >>>>> Re: Credits/references, I would add more clarity about where each of >>>>> those references are used. >>>>> >>>>> Regards, >>>>> KAM >>>>> >>>> >>>> >> >
