Hello All, I have updated the proposal. I hope this one is better. Please share your feedback.
https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALG jLH2DrIfRsJksA/edit# FYI : Student Application Deadline April 3 16:00 UTC. Regards, Krishna On Sun, Apr 2, 2017 at 2:39 PM, Krishna Kalyan <krishnakaly...@gmail.com> wrote: > Hello Nakul, > My comments in *Italics* below. > > On Sat, Apr 1, 2017 at 11:27 PM, Nakul Jindal <naku...@gmail.com> wrote: > >> Hi Krishna, >> >> Here are some questions/remarks i have about parts of your proposal: >> >> In the section titled Summary - >> >> "The systematic evaluation of performance can be measured with >> performance tests and micro-benchmarks" >> We currently do not have any micro benchmarks. Do you plan on adding any? >> (It would be awesome, but remember to keep the number of tasks reasonable >> given the time frame and your familiarity with the project) >> > *- Removed micro bench marks from the proposal. * > >> >> Your summary section feels like its generally applicable for performance >> testing on any project, which is good. However, when it comes to talking >> about what you'd actually be doing, I see - " build a benchmark >> infrastructure and conduct experiments, that compare different choices in >> critical parts (sparsity thresholds, optimisation decisions, etc..)". >> > *- I agree and have made these changes.* > > Going over each point: >> >> 1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty >> much all the tasks involved >> 2. "conduct experiments" - sure, although I think you mean testing your >> benchmarking infrastructure, please correct me if this is not what you meant >> >> > 3. "that compare different choices in critical parts" >> a. "sparsity thresholds" - awesome. You'd need to figure out what >> SystemML already does and what to add. >> b. "optimization decisions" - could you provide an example or two of >> what exactly you mean by this. Do you mean to enable and/or disable certain >> optimizations and run the perf suite and also automate the process? or >> something else? >> c. "etc" - more detail would be nice here. It would be nice to know what >> exactly you are committing to. >> *- will add more details in this section * >> >> In the section titled Deliverables - >> >> You mention >> - "automation for all performance tests" - awesome! this is the primary >> task >> - "automatic scripts to test performance on a cloud provider" - this is >> great >> - "web dashboard" - awesome! this is a nice-to-have >> >> But before the "cloud provider" and "web dashboard" task, we'd like to >> robustly check for errors and record performance numbers and generate >> reports. (Tasks 2 - 6 on https://issues.apache.org/j >> ira/browse/SYSTEMML-1451). I see that you've mentioned some of these >> tasks in you "Project milestones" section as "Understand metrics to be >> captured like time, memory, errors". It'd be good to put them here as well. >> > *- Will add this information under Deliverables* > >> >> Remember, you might also need to change the way SystemML reports errors >> and performance numbers to complete your tasks. You, along with the >> currently active members of SystemML might need to change the algorithms >> being tested as well. >> > *- Sure will keep this in mind and will account for this in proposal. * > >> >> In the section titled "Project Milestones" - >> Your project timeline looks good, the initial set of things to before May >> 30 and the fact that you've set aside the final week for buffer. You have >> dug down into a week by week schedule, which is good. I have some >> suggestion though: >> >> You need to >> T1. Understand what is happening now, try it out for yourself >> > *- Yes, I am following the documentation to simulate benchmarks on my > local system. * > > T2. You need to automate this process >> T3. You need to test that this automated process works as expected (and >> make it robust) >> T4. You need to add additional capabilities (like micro-benchmarks and/or >> parameterizing the tests and/or running it with sparse and dense sets) >> > *- I will account for T3 and T4 more explicitly in my proposal.* > > >> For each of the tasks that you mention in your deliverables, could you >> please think about how you'd spend each week doing either T1-3 for a >> deliverable that is now being done manually and T4 for one that is not >> being done at all right now? >> Please revisit some of the tasks on your timeline with this in mind. >> >> I'd also ask that you set some deliverable(s) for phase 1 (due on June >> 26), phase 2 (due on July 26) and the final phase (ends on Aug 29). >> >> A suggestion for the deliverables, if you wanted to be really ambitious >> and complete every task possible : >> Phase 1 - implement infrastructure to launch perf suite and to detect >> errors & report performance numbers in a plain text file >> Phase 2 - implement scripts to compare performance against older versions >> of SystemML and other packages (Spark MLLib) and implement mechanism to >> generate report(s) with errors and performance information in a spreadsheet >> or pdf or on a web interface >> Phase 3 - add additional perf tests for more algorithms, different >> sparsity thresholds and optimization levels and include them in the >> reports. Also implement and test scripts to run the perf suite on a cloud >> provider; doing this through a web UI. >> >> Something very conservative could be do >> Phase 1 - automate perf suite and report perf numbers >> Phase 2 - make error reporting and handling robust, compare against >> previous versions of systemml >> Phase 3 - add additional algorithms to the test suite, >> > *- I would prefer taking the conservative approach here.* > >> >> These are just a suggestions, tweak it as you see fit. >> Having a deliverable attached to the end of a phase is a good thing. >> >> Hope I am not being too critical and hopefully this helps >> > *- Not at all, appreciate your feedback detailed reply. * > > *- Could you also let me know the co-mentors for this project?. I am > working on the proposal and will share an updated version soon.* > > >> -Nakul >> >> >> >> >> On Fri, Mar 31, 2017 at 5:13 PM, Krishna Kalyan <krishnakaly...@gmail.com >> > wrote: >> >>> Hello All, >>> Based on "SYSTEMML-1451" and relevant SystemML source code, I have >>> updated the draft proposal. Please have a look and share your valuable >>> feedback. >>> >>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF >>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing >>> >>> Regards, >>> Krishna >>> >>> On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan < >>> krishnakaly...@gmail.com> wrote: >>> >>>> Hello All, >>>> I have created a proposal for >>>> >>>> d) Perftest : automated performance tests of algorithms >>>> (I am most comfortable with bash scripting and Python) >>>> >>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF >>>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing >>>> >>>> Please share your feedback on the proposal. If someone from the >>>> community could mentor, it would be great. >>>> >>>> Regards, >>>> Krishna >>>> >>>> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan < >>>> krishnakaly...@gmail.com> wrote: >>>> >>>>> Thanks Nakul, >>>>> Replied to the JIRA thread. >>>>> >>>>> Cheers, >>>>> Krishna >>>>> >>>>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <naku...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Krishna, >>>>>> >>>>>> We have 2 proposals up : >>>>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=p >>>>>> roject%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20OR >>>>>> DER%20BY%20created%20DESC >>>>>> >>>>>> Would you be interested in any of these? >>>>>> If you are specifically interested in the Python DSL project, we can >>>>>> look for more volunteers or I could just volunteer to mentor it. >>>>>> >>>>>> -Nakul >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <naku...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Krishna, >>>>>>> >>>>>>> We are working on putting together some proposals. I created is for >>>>>>> a GPU based project. >>>>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436 >>>>>>> Be on the lookout for more. >>>>>>> >>>>>>> Thanks, >>>>>>> Nakul >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan < >>>>>>> krishnakaly...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello Adina and Arvind thanks you for your reply, >>>>>>>> I am open to writing a proposal with a mentor and would appreciate >>>>>>>> if we >>>>>>>> could take action quickly on this. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Krishna >>>>>>>> >>>>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu> >>>>>>>> wrote: >>>>>>>> >>>>>>>> > Apache Software Foundation applied and was accepted for GSOC. I >>>>>>>> believe >>>>>>>> > SystemML could still participate as part of ASF if interested >>>>>>>> (record your >>>>>>>> > ideas in JIRA and put gsoc2017 as label). See messages on this >>>>>>>> subject on >>>>>>>> > the community.apache.org mailing list from Ulrich Stark. >>>>>>>> > The following page also has useful info, even if it is not >>>>>>>> updated for this >>>>>>>> > year: http://community.apache.org/gsoc.html - mentors need to >>>>>>>> register >>>>>>>> > very >>>>>>>> > soon. >>>>>>>> > >>>>>>>> > Best regards, >>>>>>>> > Adina >>>>>>>> > >>>>>>>> > >>>>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve >>>>>>>> <ac...@yahoo.com.invalid> >>>>>>>> > wrote: >>>>>>>> > >>>>>>>> > > Thanks Krishna for your interest. >>>>>>>> > > Unfortunately we could not submit topic to GSoc on time.However >>>>>>>> please >>>>>>>> > > feel free to leverage SystemML for your use cases and do >>>>>>>> possible >>>>>>>> > > contribution to SystemML. >>>>>>>> > > Please let us know if you have any question. >>>>>>>> > > >>>>>>>> > > Arvind Surve | Spark Technology Center | http://www.spark.tc/ >>>>>>>> > > >>>>>>>> > > From: Krishna Kalyan <krishnakaly...@gmail.com> >>>>>>>> > > To: dev@systemml.incubator.apache.org >>>>>>>> > > Sent: Saturday, March 18, 2017 8:18 AM >>>>>>>> > > Subject: Re: GSoc 2017 >>>>>>>> > > >>>>>>>> > > Hello All, >>>>>>>> > > A Gentle ping. Student applications open in a couple of days. I >>>>>>>> like to >>>>>>>> > > work on 'Support for Python DSLs'. >>>>>>>> > > However for now I am not sure on how to proceed. >>>>>>>> > > >>>>>>>> > > Thank you, >>>>>>>> > > Krishna >>>>>>>> > > >>>>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <dusenberr...@gmail.com> >>>>>>>> wrote: >>>>>>>> > > >>>>>>>> > > > Yeah helping to build out our Python DSL into a full-out >>>>>>>> replacement >>>>>>>> > for >>>>>>>> > > > the current "DML" language would be great, and we'd be quite >>>>>>>> > supportive! >>>>>>>> > > > >>>>>>>> > > > -Mike >>>>>>>> > > > >>>>>>>> > > > -- >>>>>>>> > > > >>>>>>>> > > > Mike Dusenberry >>>>>>>> > > > GitHub: github.com/dusenberrymw >>>>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry >>>>>>>> > > > >>>>>>>> > > > Sent from my iPhone. >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschue...@posteo.de wrote: >>>>>>>> > > > > >>>>>>>> > > > > Hi Krishna, >>>>>>>> > > > > >>>>>>>> > > > > cool to see that you're interested in SystemML! >>>>>>>> > > > > >>>>>>>> > > > > From your list I personally think that a) and d) would be >>>>>>>> well suited >>>>>>>> > > > for projects, especially a good python DSL is a high priority. >>>>>>>> > > > > >>>>>>>> > > > > We will apply as an organization to GSoC once organization >>>>>>>> > applications >>>>>>>> > > > are open (Jan. 19th) and I think we will find mentors for at >>>>>>>> least a) >>>>>>>> > and >>>>>>>> > > > d). If you already want to take a look at what is currently >>>>>>>> there, I >>>>>>>> > > > suggest to look at our python APIs and documentation. If you >>>>>>>> want to >>>>>>>> > take >>>>>>>> > > > on the DSL project it might also be a good idea to look into >>>>>>>> the DML >>>>>>>> > > > documentation and related papers to see what we need to >>>>>>>> support. >>>>>>>> > > > > >>>>>>>> > > > > The proposals will probably circulate on the mailinglist, >>>>>>>> too, so >>>>>>>> > keep >>>>>>>> > > > an eye on that :) >>>>>>>> > > > > >>>>>>>> > > > > -Felix >>>>>>>> > > > > >>>>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan: >>>>>>>> > > > >> Hello All, >>>>>>>> > > > >> Thank you for your wonderful replies. >>>>>>>> > > > >> Tasks that I am interested in: >>>>>>>> > > > >> a) Support for Python DSLs >>>>>>>> > > > >> b) Python wrappers for all existing algorithms >>>>>>>> > > > >> c) GPU support >>>>>>>> > > > >> d) Perftest : automated performance tests of algorithms >>>>>>>> > > > >> I am also willing to work on the tasks that SystemML >>>>>>>> community think >>>>>>>> > > are >>>>>>>> > > > >> important. >>>>>>>> > > > >> Regards, >>>>>>>> > > > >> Krishna >>>>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry < >>>>>>>> > > > dusenberr...@gmail.com> >>>>>>>> > > > >> wrote: >>>>>>>> > > > >>> Hi Krishna! Welcome, and thanks for your interest! >>>>>>>> > > > >>> We would definitely be excited to collaborate with you on >>>>>>>> a GSOC >>>>>>>> > > > project. >>>>>>>> > > > >>> We've started another thread to discuss possible new >>>>>>>> proposals, and >>>>>>>> > > we >>>>>>>> > > > >>> would also be quite interested in any particular proposal >>>>>>>> that you >>>>>>>> > > > might >>>>>>>> > > > >>> like to generate tailored towards your interests. Copied >>>>>>>> from the >>>>>>>> > > > other >>>>>>>> > > > >>> thread, some possible ideas could include: building out a >>>>>>>> full ML >>>>>>>> > > demo >>>>>>>> > > > to >>>>>>>> > > > >>> solve a real, large-scale problem that would benefit from >>>>>>>> a >>>>>>>> > > distributed >>>>>>>> > > > >>> approach; overall performance improvements that address a >>>>>>>> full >>>>>>>> > class, >>>>>>>> > > > or >>>>>>>> > > > >>> wider area, of ML algorithms, rather than a single, >>>>>>>> specific >>>>>>>> > script; >>>>>>>> > > > >>> infrastructure for [performance] testing, and >>>>>>>> identification of >>>>>>>> > wide >>>>>>>> > > > areas >>>>>>>> > > > >>> of improvement; helping with building out fully-featured, >>>>>>>> clean, >>>>>>>> > > > >>> well-tested DSLs in Python & Scala (we've started, but it >>>>>>>> would be >>>>>>>> > > > good to >>>>>>>> > > > >>> continue stressing them -- we could even aim to replace >>>>>>>> DML with >>>>>>>> > the >>>>>>>> > > > DSLs); >>>>>>>> > > > >>> etc. Overall, we want to improve the ability of the user >>>>>>>> to work >>>>>>>> > on >>>>>>>> > > a >>>>>>>> > > > wide >>>>>>>> > > > >>> range of large-scale, distributed ML problems in a simple >>>>>>>> and easy >>>>>>>> > > > manner >>>>>>>> > > > >>> on top of Spark. >>>>>>>> > > > >>> In the meantime, you could explore our recent open issues >>>>>>>> [1] and >>>>>>>> > > even >>>>>>>> > > > >>> begin discussions or contributions on any of the items. >>>>>>>> You could >>>>>>>> > > also >>>>>>>> > > > >>> view our recent roadmap discussion thread on the mailing >>>>>>>> list, >>>>>>>> > > starting >>>>>>>> > > > >>> with the first email [2]: >>>>>>>> > > > >>> [1]: >>>>>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D% >>>>>>>> > > > 20SYSTEMML%20AND% >>>>>>>> > > > >>> 20resolution%20%3D%20Unresolve >>>>>>>> d%20ORDER%20BY%20updated%20DESC%2C% >>>>>>>> > > > >>> 20priority%20DESC >>>>>>>> > > > >>> [2]: >>>>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator- >>>>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c- >>>>>>>> > > > >>> bad740599...@gmail.com%3E >>>>>>>> > > > >>> - Mike >>>>>>>> > > > >>> -- >>>>>>>> > > > >>> Michael W. Dusenberry >>>>>>>> > > > >>> GitHub: github.com/dusenberrymw >>>>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry >>>>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende < >>>>>>>> > > luckbr1...@gmail.com >>>>>>>> > > > > >>>>>>>> > > > >>> wrote: >>>>>>>> > > > >>> > As some folks have described on this thread, it would >>>>>>>> be great to >>>>>>>> > > > get you >>>>>>>> > > > >>> > familiarized with SystemML. >>>>>>>> > > > >>> > >>>>>>>> > > > >>> > In parallel, I would look for a mentor from the active >>>>>>>> committer >>>>>>>> > > > list and >>>>>>>> > > > >>> > start working on a project proposal which could be >>>>>>>> based on the >>>>>>>> > > > recent >>>>>>>> > > > >>> > Roadmap discussion [1]. >>>>>>>> > > > >>> > >>>>>>>> > > > >>> > If you are looking for some guidance on how Apache >>>>>>>> participate on >>>>>>>> > > > GSOC, >>>>>>>> > > > >>> > take a look at the following resources [2] and [3], and >>>>>>>> don't >>>>>>>> > > > hesitate to >>>>>>>> > > > >>> > ask questions here. >>>>>>>> > > > >>> > >>>>>>>> > > > >>> > >>>>>>>> > > > >>> > [1] >>>>>>>> > > > >>> > https://www.mail-archive.com/d >>>>>>>> ev@systemml.incubator.apache.o >>>>>>>> > > > >>> > rg/msg01199.html >>>>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html >>>>>>>> > > > >>> > [3] >>>>>>>> > > > >>> > http://www.slideshare.net/luck >>>>>>>> br1975/how-mentoring-can-help- >>>>>>>> > > > >>> > you-start-contributing-to-open-source >>>>>>>> > > > >>> > >>>>>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan < >>>>>>>> > > > krishnakaly...@gmail.com >>>>>>>> > > > >>> > >>>>>>>> > > > >>> > wrote: >>>>>>>> > > > >>> > >>>>>>>> > > > >>> > > Hello Developers, >>>>>>>> > > > >>> > > I am Krishna, currently a 2nd year Masters student in >>>>>>>> (MSc. in >>>>>>>> > > Data >>>>>>>> > > > >>> > Mining) >>>>>>>> > > > >>> > > currently in Barcelona studying at Université >>>>>>>> Polytechnique de >>>>>>>> > > > >>> Catalogne. >>>>>>>> > > > >>> > > I was interested in contributing to SystemML this >>>>>>>> year under >>>>>>>> > GSoc >>>>>>>> > > > >>> > program. >>>>>>>> > > > >>> > > Could anyone please guide on how to go about it?. (I >>>>>>>> understand >>>>>>>> > > > the I >>>>>>>> > > > >>> > need >>>>>>>> > > > >>> > > to write a proposal) >>>>>>>> > > > >>> > > >>>>>>>> > > > >>> > > Related Experience: >>>>>>>> > > > >>> > > My masters is mostly focussed on data mining >>>>>>>> techniques. Before >>>>>>>> > > my >>>>>>>> > > > >>> > masters, >>>>>>>> > > > >>> > > I was a data engineer with IBM (India). I was >>>>>>>> responsible for >>>>>>>> > > > managing >>>>>>>> > > > >>> > 50 >>>>>>>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my >>>>>>>> time was >>>>>>>> > > spent >>>>>>>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs. >>>>>>>> > > > >>> > > >>>>>>>> > > > >>> > > I am the most comfortable with Python followed by R >>>>>>>> and Scala. >>>>>>>> > > > >>> > > >>>>>>>> > > > >>> > > My Webpage >>>>>>>> > > > >>> > > kkalyan.in >>>>>>>> > > > >>> > > >>>>>>>> > > > >>> > > My Spark Pull Requests >>>>>>>> > > > >>> > > https://github.com/apache/spar >>>>>>>> k/pulls?utf8=%E2%9C%93&q= >>>>>>>> > > > >>> is%3Apr%20author% >>>>>>>> > > > >>> > > 3Akrishnakalyan3%20 >>>>>>>> > > > >>> > > >>>>>>>> > > > >>> > > Thank you so much, >>>>>>>> > > > >>> > > Krishna >>>>>>>> > > > >>> > > >>>>>>>> > > > >>> > >>>>>>>> > > > >>> > >>>>>>>> > > > >>> > >>>>>>>> > > > >>> > -- >>>>>>>> > > > >>> > Luciano Resende >>>>>>>> > > > >>> > http://twitter.com/lresende1975 >>>>>>>> > > > >>> > http://lresende.blogspot.com/ >>>>>>>> > > > >>> > >>>>>>>> > > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > -- >>>>>>>> > Dr. Adina Crainiceanu >>>>>>>> > Associate Professor, Computer Science Department >>>>>>>> > United States Naval Academy >>>>>>>> > 410-293-6822 >>>>>>>> > ad...@usna.edu >>>>>>>> > http://www.usna.edu/Users/cs/adina/ >>>>>>>> > >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >