Hello Nakul, My comments in *Italics* below. On Sat, Apr 1, 2017 at 11:27 PM, Nakul Jindal <naku...@gmail.com> wrote:
> Hi Krishna, > > Here are some questions/remarks i have about parts of your proposal: > > In the section titled Summary - > > "The systematic evaluation of performance can be measured with > performance tests and micro-benchmarks" > We currently do not have any micro benchmarks. Do you plan on adding any? > (It would be awesome, but remember to keep the number of tasks reasonable > given the time frame and your familiarity with the project) > *- Removed micro bench marks from the proposal. * > > Your summary section feels like its generally applicable for performance > testing on any project, which is good. However, when it comes to talking > about what you'd actually be doing, I see - " build a benchmark > infrastructure and conduct experiments, that compare different choices in > critical parts (sparsity thresholds, optimisation decisions, etc..)". > *- I agree and have made these changes.* Going over each point: > > 1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty > much all the tasks involved > 2. "conduct experiments" - sure, although I think you mean testing your > benchmarking infrastructure, please correct me if this is not what you meant > > 3. "that compare different choices in critical parts" > a. "sparsity thresholds" - awesome. You'd need to figure out what SystemML > already does and what to add. > b. "optimization decisions" - could you provide an example or two of what > exactly you mean by this. Do you mean to enable and/or disable certain > optimizations and run the perf suite and also automate the process? or > something else? > c. "etc" - more detail would be nice here. It would be nice to know what > exactly you are committing to. > *- will add more details in this section * > > In the section titled Deliverables - > > You mention > - "automation for all performance tests" - awesome! this is the primary > task > - "automatic scripts to test performance on a cloud provider" - this is > great > - "web dashboard" - awesome! this is a nice-to-have > > But before the "cloud provider" and "web dashboard" task, we'd like to > robustly check for errors and record performance numbers and generate > reports. (Tasks 2 - 6 on https://issues.apache.org/j > ira/browse/SYSTEMML-1451). I see that you've mentioned some of these > tasks in you "Project milestones" section as "Understand metrics to be > captured like time, memory, errors". It'd be good to put them here as well. > *- Will add this information under Deliverables* > > Remember, you might also need to change the way SystemML reports errors > and performance numbers to complete your tasks. You, along with the > currently active members of SystemML might need to change the algorithms > being tested as well. > *- Sure will keep this in mind and will account for this in proposal. * > > In the section titled "Project Milestones" - > Your project timeline looks good, the initial set of things to before May > 30 and the fact that you've set aside the final week for buffer. You have > dug down into a week by week schedule, which is good. I have some > suggestion though: > > You need to > T1. Understand what is happening now, try it out for yourself > *- Yes, I am following the documentation to simulate benchmarks on my local system. * T2. You need to automate this process > T3. You need to test that this automated process works as expected (and > make it robust) > T4. You need to add additional capabilities (like micro-benchmarks and/or > parameterizing the tests and/or running it with sparse and dense sets) > *- I will account for T3 and T4 more explicitly in my proposal.* > For each of the tasks that you mention in your deliverables, could you > please think about how you'd spend each week doing either T1-3 for a > deliverable that is now being done manually and T4 for one that is not > being done at all right now? > Please revisit some of the tasks on your timeline with this in mind. > > I'd also ask that you set some deliverable(s) for phase 1 (due on June > 26), phase 2 (due on July 26) and the final phase (ends on Aug 29). > > A suggestion for the deliverables, if you wanted to be really ambitious > and complete every task possible : > Phase 1 - implement infrastructure to launch perf suite and to detect > errors & report performance numbers in a plain text file > Phase 2 - implement scripts to compare performance against older versions > of SystemML and other packages (Spark MLLib) and implement mechanism to > generate report(s) with errors and performance information in a spreadsheet > or pdf or on a web interface > Phase 3 - add additional perf tests for more algorithms, different > sparsity thresholds and optimization levels and include them in the > reports. Also implement and test scripts to run the perf suite on a cloud > provider; doing this through a web UI. > > Something very conservative could be do > Phase 1 - automate perf suite and report perf numbers > Phase 2 - make error reporting and handling robust, compare against > previous versions of systemml > Phase 3 - add additional algorithms to the test suite, > *- I would prefer taking the conservative approach here.* > > These are just a suggestions, tweak it as you see fit. > Having a deliverable attached to the end of a phase is a good thing. > > Hope I am not being too critical and hopefully this helps > *- Not at all, appreciate your feedback detailed reply. * *- Could you also let me know the co-mentors for this project?. I am working on the proposal and will share an updated version soon.* > -Nakul > > > > > On Fri, Mar 31, 2017 at 5:13 PM, Krishna Kalyan <krishnakaly...@gmail.com> > wrote: > >> Hello All, >> Based on "SYSTEMML-1451" and relevant SystemML source code, I have >> updated the draft proposal. Please have a look and share your valuable >> feedback. >> >> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF >> o8ALGjLH2DrIfRsJksA/edit?usp=sharing >> >> Regards, >> Krishna >> >> On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan <krishnakaly...@gmail.com >> > wrote: >> >>> Hello All, >>> I have created a proposal for >>> >>> d) Perftest : automated performance tests of algorithms >>> (I am most comfortable with bash scripting and Python) >>> >>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF >>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing >>> >>> Please share your feedback on the proposal. If someone from the >>> community could mentor, it would be great. >>> >>> Regards, >>> Krishna >>> >>> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan < >>> krishnakaly...@gmail.com> wrote: >>> >>>> Thanks Nakul, >>>> Replied to the JIRA thread. >>>> >>>> Cheers, >>>> Krishna >>>> >>>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <naku...@gmail.com> >>>> wrote: >>>> >>>>> Hi Krishna, >>>>> >>>>> We have 2 proposals up : >>>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=p >>>>> roject%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20OR >>>>> DER%20BY%20created%20DESC >>>>> >>>>> Would you be interested in any of these? >>>>> If you are specifically interested in the Python DSL project, we can >>>>> look for more volunteers or I could just volunteer to mentor it. >>>>> >>>>> -Nakul >>>>> >>>>> >>>>> >>>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <naku...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Krishna, >>>>>> >>>>>> We are working on putting together some proposals. I created is for a >>>>>> GPU based project. >>>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436 >>>>>> Be on the lookout for more. >>>>>> >>>>>> Thanks, >>>>>> Nakul >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan < >>>>>> krishnakaly...@gmail.com> wrote: >>>>>> >>>>>>> Hello Adina and Arvind thanks you for your reply, >>>>>>> I am open to writing a proposal with a mentor and would appreciate >>>>>>> if we >>>>>>> could take action quickly on this. >>>>>>> >>>>>>> Best Regards, >>>>>>> Krishna >>>>>>> >>>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu> >>>>>>> wrote: >>>>>>> >>>>>>> > Apache Software Foundation applied and was accepted for GSOC. I >>>>>>> believe >>>>>>> > SystemML could still participate as part of ASF if interested >>>>>>> (record your >>>>>>> > ideas in JIRA and put gsoc2017 as label). See messages on this >>>>>>> subject on >>>>>>> > the community.apache.org mailing list from Ulrich Stark. >>>>>>> > The following page also has useful info, even if it is not updated >>>>>>> for this >>>>>>> > year: http://community.apache.org/gsoc.html - mentors need to >>>>>>> register >>>>>>> > very >>>>>>> > soon. >>>>>>> > >>>>>>> > Best regards, >>>>>>> > Adina >>>>>>> > >>>>>>> > >>>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve >>>>>>> <ac...@yahoo.com.invalid> >>>>>>> > wrote: >>>>>>> > >>>>>>> > > Thanks Krishna for your interest. >>>>>>> > > Unfortunately we could not submit topic to GSoc on time.However >>>>>>> please >>>>>>> > > feel free to leverage SystemML for your use cases and do possible >>>>>>> > > contribution to SystemML. >>>>>>> > > Please let us know if you have any question. >>>>>>> > > >>>>>>> > > Arvind Surve | Spark Technology Center | http://www.spark.tc/ >>>>>>> > > >>>>>>> > > From: Krishna Kalyan <krishnakaly...@gmail.com> >>>>>>> > > To: dev@systemml.incubator.apache.org >>>>>>> > > Sent: Saturday, March 18, 2017 8:18 AM >>>>>>> > > Subject: Re: GSoc 2017 >>>>>>> > > >>>>>>> > > Hello All, >>>>>>> > > A Gentle ping. Student applications open in a couple of days. I >>>>>>> like to >>>>>>> > > work on 'Support for Python DSLs'. >>>>>>> > > However for now I am not sure on how to proceed. >>>>>>> > > >>>>>>> > > Thank you, >>>>>>> > > Krishna >>>>>>> > > >>>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <dusenberr...@gmail.com> wrote: >>>>>>> > > >>>>>>> > > > Yeah helping to build out our Python DSL into a full-out >>>>>>> replacement >>>>>>> > for >>>>>>> > > > the current "DML" language would be great, and we'd be quite >>>>>>> > supportive! >>>>>>> > > > >>>>>>> > > > -Mike >>>>>>> > > > >>>>>>> > > > -- >>>>>>> > > > >>>>>>> > > > Mike Dusenberry >>>>>>> > > > GitHub: github.com/dusenberrymw >>>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry >>>>>>> > > > >>>>>>> > > > Sent from my iPhone. >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschue...@posteo.de wrote: >>>>>>> > > > > >>>>>>> > > > > Hi Krishna, >>>>>>> > > > > >>>>>>> > > > > cool to see that you're interested in SystemML! >>>>>>> > > > > >>>>>>> > > > > From your list I personally think that a) and d) would be >>>>>>> well suited >>>>>>> > > > for projects, especially a good python DSL is a high priority. >>>>>>> > > > > >>>>>>> > > > > We will apply as an organization to GSoC once organization >>>>>>> > applications >>>>>>> > > > are open (Jan. 19th) and I think we will find mentors for at >>>>>>> least a) >>>>>>> > and >>>>>>> > > > d). If you already want to take a look at what is currently >>>>>>> there, I >>>>>>> > > > suggest to look at our python APIs and documentation. If you >>>>>>> want to >>>>>>> > take >>>>>>> > > > on the DSL project it might also be a good idea to look into >>>>>>> the DML >>>>>>> > > > documentation and related papers to see what we need to >>>>>>> support. >>>>>>> > > > > >>>>>>> > > > > The proposals will probably circulate on the mailinglist, >>>>>>> too, so >>>>>>> > keep >>>>>>> > > > an eye on that :) >>>>>>> > > > > >>>>>>> > > > > -Felix >>>>>>> > > > > >>>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan: >>>>>>> > > > >> Hello All, >>>>>>> > > > >> Thank you for your wonderful replies. >>>>>>> > > > >> Tasks that I am interested in: >>>>>>> > > > >> a) Support for Python DSLs >>>>>>> > > > >> b) Python wrappers for all existing algorithms >>>>>>> > > > >> c) GPU support >>>>>>> > > > >> d) Perftest : automated performance tests of algorithms >>>>>>> > > > >> I am also willing to work on the tasks that SystemML >>>>>>> community think >>>>>>> > > are >>>>>>> > > > >> important. >>>>>>> > > > >> Regards, >>>>>>> > > > >> Krishna >>>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry < >>>>>>> > > > dusenberr...@gmail.com> >>>>>>> > > > >> wrote: >>>>>>> > > > >>> Hi Krishna! Welcome, and thanks for your interest! >>>>>>> > > > >>> We would definitely be excited to collaborate with you on >>>>>>> a GSOC >>>>>>> > > > project. >>>>>>> > > > >>> We've started another thread to discuss possible new >>>>>>> proposals, and >>>>>>> > > we >>>>>>> > > > >>> would also be quite interested in any particular proposal >>>>>>> that you >>>>>>> > > > might >>>>>>> > > > >>> like to generate tailored towards your interests. Copied >>>>>>> from the >>>>>>> > > > other >>>>>>> > > > >>> thread, some possible ideas could include: building out a >>>>>>> full ML >>>>>>> > > demo >>>>>>> > > > to >>>>>>> > > > >>> solve a real, large-scale problem that would benefit from a >>>>>>> > > distributed >>>>>>> > > > >>> approach; overall performance improvements that address a >>>>>>> full >>>>>>> > class, >>>>>>> > > > or >>>>>>> > > > >>> wider area, of ML algorithms, rather than a single, >>>>>>> specific >>>>>>> > script; >>>>>>> > > > >>> infrastructure for [performance] testing, and >>>>>>> identification of >>>>>>> > wide >>>>>>> > > > areas >>>>>>> > > > >>> of improvement; helping with building out fully-featured, >>>>>>> clean, >>>>>>> > > > >>> well-tested DSLs in Python & Scala (we've started, but it >>>>>>> would be >>>>>>> > > > good to >>>>>>> > > > >>> continue stressing them -- we could even aim to replace >>>>>>> DML with >>>>>>> > the >>>>>>> > > > DSLs); >>>>>>> > > > >>> etc. Overall, we want to improve the ability of the user >>>>>>> to work >>>>>>> > on >>>>>>> > > a >>>>>>> > > > wide >>>>>>> > > > >>> range of large-scale, distributed ML problems in a simple >>>>>>> and easy >>>>>>> > > > manner >>>>>>> > > > >>> on top of Spark. >>>>>>> > > > >>> In the meantime, you could explore our recent open issues >>>>>>> [1] and >>>>>>> > > even >>>>>>> > > > >>> begin discussions or contributions on any of the items. >>>>>>> You could >>>>>>> > > also >>>>>>> > > > >>> view our recent roadmap discussion thread on the mailing >>>>>>> list, >>>>>>> > > starting >>>>>>> > > > >>> with the first email [2]: >>>>>>> > > > >>> [1]: >>>>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D% >>>>>>> > > > 20SYSTEMML%20AND% >>>>>>> > > > >>> 20resolution%20%3D%20Unresolve >>>>>>> d%20ORDER%20BY%20updated%20DESC%2C% >>>>>>> > > > >>> 20priority%20DESC >>>>>>> > > > >>> [2]: >>>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator- >>>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c- >>>>>>> > > > >>> bad740599...@gmail.com%3E >>>>>>> > > > >>> - Mike >>>>>>> > > > >>> -- >>>>>>> > > > >>> Michael W. Dusenberry >>>>>>> > > > >>> GitHub: github.com/dusenberrymw >>>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry >>>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende < >>>>>>> > > luckbr1...@gmail.com >>>>>>> > > > > >>>>>>> > > > >>> wrote: >>>>>>> > > > >>> > As some folks have described on this thread, it would be >>>>>>> great to >>>>>>> > > > get you >>>>>>> > > > >>> > familiarized with SystemML. >>>>>>> > > > >>> > >>>>>>> > > > >>> > In parallel, I would look for a mentor from the active >>>>>>> committer >>>>>>> > > > list and >>>>>>> > > > >>> > start working on a project proposal which could be based >>>>>>> on the >>>>>>> > > > recent >>>>>>> > > > >>> > Roadmap discussion [1]. >>>>>>> > > > >>> > >>>>>>> > > > >>> > If you are looking for some guidance on how Apache >>>>>>> participate on >>>>>>> > > > GSOC, >>>>>>> > > > >>> > take a look at the following resources [2] and [3], and >>>>>>> don't >>>>>>> > > > hesitate to >>>>>>> > > > >>> > ask questions here. >>>>>>> > > > >>> > >>>>>>> > > > >>> > >>>>>>> > > > >>> > [1] >>>>>>> > > > >>> > https://www.mail-archive.com/d >>>>>>> ev@systemml.incubator.apache.o >>>>>>> > > > >>> > rg/msg01199.html >>>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html >>>>>>> > > > >>> > [3] >>>>>>> > > > >>> > http://www.slideshare.net/luck >>>>>>> br1975/how-mentoring-can-help- >>>>>>> > > > >>> > you-start-contributing-to-open-source >>>>>>> > > > >>> > >>>>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan < >>>>>>> > > > krishnakaly...@gmail.com >>>>>>> > > > >>> > >>>>>>> > > > >>> > wrote: >>>>>>> > > > >>> > >>>>>>> > > > >>> > > Hello Developers, >>>>>>> > > > >>> > > I am Krishna, currently a 2nd year Masters student in >>>>>>> (MSc. in >>>>>>> > > Data >>>>>>> > > > >>> > Mining) >>>>>>> > > > >>> > > currently in Barcelona studying at Université >>>>>>> Polytechnique de >>>>>>> > > > >>> Catalogne. >>>>>>> > > > >>> > > I was interested in contributing to SystemML this year >>>>>>> under >>>>>>> > GSoc >>>>>>> > > > >>> > program. >>>>>>> > > > >>> > > Could anyone please guide on how to go about it?. (I >>>>>>> understand >>>>>>> > > > the I >>>>>>> > > > >>> > need >>>>>>> > > > >>> > > to write a proposal) >>>>>>> > > > >>> > > >>>>>>> > > > >>> > > Related Experience: >>>>>>> > > > >>> > > My masters is mostly focussed on data mining >>>>>>> techniques. Before >>>>>>> > > my >>>>>>> > > > >>> > masters, >>>>>>> > > > >>> > > I was a data engineer with IBM (India). I was >>>>>>> responsible for >>>>>>> > > > managing >>>>>>> > > > >>> > 50 >>>>>>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my >>>>>>> time was >>>>>>> > > spent >>>>>>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs. >>>>>>> > > > >>> > > >>>>>>> > > > >>> > > I am the most comfortable with Python followed by R >>>>>>> and Scala. >>>>>>> > > > >>> > > >>>>>>> > > > >>> > > My Webpage >>>>>>> > > > >>> > > kkalyan.in >>>>>>> > > > >>> > > >>>>>>> > > > >>> > > My Spark Pull Requests >>>>>>> > > > >>> > > https://github.com/apache/spar >>>>>>> k/pulls?utf8=%E2%9C%93&q= >>>>>>> > > > >>> is%3Apr%20author% >>>>>>> > > > >>> > > 3Akrishnakalyan3%20 >>>>>>> > > > >>> > > >>>>>>> > > > >>> > > Thank you so much, >>>>>>> > > > >>> > > Krishna >>>>>>> > > > >>> > > >>>>>>> > > > >>> > >>>>>>> > > > >>> > >>>>>>> > > > >>> > >>>>>>> > > > >>> > -- >>>>>>> > > > >>> > Luciano Resende >>>>>>> > > > >>> > http://twitter.com/lresende1975 >>>>>>> > > > >>> > http://lresende.blogspot.com/ >>>>>>> > > > >>> > >>>>>>> > > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > Dr. Adina Crainiceanu >>>>>>> > Associate Professor, Computer Science Department >>>>>>> > United States Naval Academy >>>>>>> > 410-293-6822 >>>>>>> > ad...@usna.edu >>>>>>> > http://www.usna.edu/Users/cs/adina/ >>>>>>> > >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >