Hello All,
I have updated the proposal. I hope this one is better. Please share your
feedback.

https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GFo8ALG
jLH2DrIfRsJksA/edit#

FYI : Student Application Deadline April 3 16:00 UTC.


Regards,
Krishna

On Sun, Apr 2, 2017 at 2:39 PM, Krishna Kalyan <krishnakaly...@gmail.com>
wrote:

> Hello Nakul,
> My comments in *Italics* below.
>
> On Sat, Apr 1, 2017 at 11:27 PM, Nakul Jindal <naku...@gmail.com> wrote:
>
>> Hi Krishna,
>>
>> Here are some questions/remarks i have about parts of your proposal:
>>
>> In the section titled Summary -
>>
>> "The systematic evaluation of performance can be measured with
>> performance tests and micro-benchmarks"
>> We currently do not have any micro benchmarks. Do you plan on adding any?
>> (It would be awesome, but remember to keep the number of tasks reasonable
>> given the time frame and your familiarity with the project)
>>
> *- Removed micro bench marks from the proposal. *
>
>>
>> Your summary section feels like its generally applicable for performance
>> testing on any project, which is good. However, when it comes to talking
>> about what you'd actually be doing, I see - " build a benchmark
>> infrastructure and conduct experiments, that compare different choices in
>> critical parts (sparsity thresholds, optimisation decisions, etc..)".
>>
> *-  I agree and have made these changes.*
>
> Going over each point:
>>
>> 1. "build a benchmark infrastructure" - ok, i guess this subsumes pretty
>> much all the tasks involved
>> 2. "conduct experiments" - sure, although I think you mean testing your
>> benchmarking infrastructure, please correct me if this is not what you meant
>>
>>
> 3. "that compare different choices in critical parts"
>> a. "sparsity thresholds" - awesome. You'd need to figure out what
>> SystemML already does and what to add.
>> b. "optimization decisions" - could you provide an example or two of
>> what exactly you mean by this. Do you mean to enable and/or disable certain
>> optimizations and run the perf suite and also automate the process? or
>> something else?
>> c. "etc" - more detail would be nice here. It would be nice to know what
>> exactly you are committing to.
>> *- will add more details in this section *
>>
>> In the section titled Deliverables -
>>
>> You mention
>> - "automation for all performance tests" - awesome! this is the primary
>> task
>> - "automatic scripts to test performance on a cloud provider" - this is
>> great
>> - "web dashboard" - awesome! this is a nice-to-have
>>
>> But before the "cloud provider" and "web dashboard" task, we'd like to
>> robustly check for errors and record performance numbers and generate
>> reports. (Tasks 2 - 6 on https://issues.apache.org/j
>> ira/browse/SYSTEMML-1451). I see that you've mentioned some of these
>> tasks in you "Project milestones" section as "Understand metrics to be
>> captured like time, memory, errors". It'd be good to put them here as well.
>>
> *- Will add this information under Deliverables*
>
>>
>> Remember, you might also need to change the way SystemML reports errors
>> and performance numbers to complete your tasks. You, along with the
>> currently active members of SystemML might need to change the algorithms
>> being tested as well.
>>
> *- Sure will keep this in mind and will account for this in proposal. *
>
>>
>> In the section titled "Project Milestones" -
>> Your project timeline looks good, the initial set of things to before May
>> 30 and the fact that you've set aside the final week for buffer. You have
>> dug down into a week by week schedule, which is good. I have some
>> suggestion though:
>>
>> You need to
>> T1. Understand what is happening now, try it out for yourself
>>
> *- Yes, I am following the documentation to simulate benchmarks on my
> local system. *
>
> T2. You need to automate this process
>> T3. You need to test that this automated process works as expected (and
>> make it robust)
>> T4. You need to add additional capabilities (like micro-benchmarks and/or
>> parameterizing the tests and/or running it with sparse and dense sets)
>>
> *- I will account for T3 and T4 more explicitly in my proposal.*
>
>
>> For each of the tasks that you mention in your deliverables, could you
>> please think about how you'd spend each week doing either T1-3 for a
>> deliverable that is now being done manually and T4 for one that is not
>> being done at all right now?
>> Please revisit some of the tasks on your timeline with this in mind.
>>
>> I'd also ask that you set some deliverable(s) for phase 1 (due on June
>> 26), phase 2 (due on July 26) and the final phase (ends on Aug 29).
>>
>> A suggestion for the deliverables, if you wanted to be really ambitious
>> and complete every task possible :
>> Phase 1 - implement infrastructure to launch perf suite and to detect
>> errors & report performance numbers in a plain text file
>> Phase 2 - implement scripts to compare performance against older versions
>> of SystemML and other packages (Spark MLLib) and implement mechanism to
>> generate report(s) with errors and performance information in a spreadsheet
>> or pdf or on a web interface
>> Phase 3 - add additional perf tests for more algorithms, different
>> sparsity thresholds and optimization levels and include them in the
>> reports. Also implement and test scripts to run the perf suite on a cloud
>> provider; doing this through a web UI.
>>
>> Something very conservative could be do
>> Phase 1 - automate perf suite and report perf numbers
>> Phase 2 - make error reporting and handling robust, compare against
>> previous versions of systemml
>> Phase 3 - add additional algorithms to the test suite,
>>
> *- I would prefer taking the conservative approach here.*
>
>>
>> These are just a suggestions, tweak it as you see fit.
>> Having a deliverable attached to the end of a phase is a good thing.
>>
>> Hope I am not being too critical and hopefully this helps
>>
> *- Not at all,  appreciate your feedback detailed reply. *
>
> *- Could you also let me know the co-mentors for this project?. I am
> working on the proposal and will share an updated version soon.*
>
>
>> -Nakul
>>
>>
>>
>>
>> On Fri, Mar 31, 2017 at 5:13 PM, Krishna Kalyan <krishnakaly...@gmail.com
>> > wrote:
>>
>>> Hello All,
>>> Based on "SYSTEMML-1451" and  relevant SystemML source code, I have
>>> updated the draft proposal. Please have a look and share your valuable
>>> feedback.
>>>
>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>
>>> Regards,
>>> Krishna
>>>
>>> On Thu, Mar 30, 2017 at 8:20 PM, Krishna Kalyan <
>>> krishnakaly...@gmail.com> wrote:
>>>
>>>> Hello All,
>>>> I have created a proposal for
>>>>
>>>> d) Perftest : automated performance tests of algorithms
>>>> (I am most comfortable with bash scripting and Python)
>>>>
>>>> https://docs.google.com/document/d/1DKWZTWvrvs73GYa1q3XEN5GF
>>>> o8ALGjLH2DrIfRsJksA/edit?usp=sharing
>>>>
>>>> Please share your feedback on the proposal. If someone from the
>>>> community could mentor, it would be great.
>>>>
>>>> Regards,
>>>> Krishna
>>>>
>>>> On Mon, Mar 27, 2017 at 6:07 PM, Krishna Kalyan <
>>>> krishnakaly...@gmail.com> wrote:
>>>>
>>>>> Thanks Nakul,
>>>>> Replied to the JIRA thread.
>>>>>
>>>>> Cheers,
>>>>> Krishna
>>>>>
>>>>> On Mon, Mar 27, 2017 at 2:51 PM, Nakul Jindal <naku...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Krishna,
>>>>>>
>>>>>> We have 2 proposals up :
>>>>>> https://issues.apache.org/jira/issues/?filter=12339687&jql=p
>>>>>> roject%20%3D%20SYSTEMML%20AND%20labels%20%3D%20gsoc2017%20OR
>>>>>> DER%20BY%20created%20DESC
>>>>>>
>>>>>> Would you be interested in any of these?
>>>>>> If you are specifically interested in the Python DSL project, we can
>>>>>> look for more volunteers or I could just volunteer to mentor it.
>>>>>>
>>>>>> -Nakul
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 24, 2017 at 12:05 PM, Nakul Jindal <naku...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Krishna,
>>>>>>>
>>>>>>> We are working on putting together some proposals. I created is for
>>>>>>> a GPU based project.
>>>>>>> https://issues.apache.org/jira/browse/SYSTEMML-1436
>>>>>>> Be on the lookout for more.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Nakul
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 21, 2017 at 10:01 AM, Krishna Kalyan <
>>>>>>> krishnakaly...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello Adina and Arvind thanks you for your reply,
>>>>>>>> I am open to writing a proposal with a mentor and would appreciate
>>>>>>>> if we
>>>>>>>> could take action quickly on this.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Krishna
>>>>>>>>
>>>>>>>> On Sun, Mar 19, 2017 at 6:02 PM, Adina Crainiceanu <ad...@usna.edu>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> > Apache Software Foundation applied and was accepted for GSOC. I
>>>>>>>> believe
>>>>>>>> > SystemML could still participate as part of ASF if interested
>>>>>>>> (record your
>>>>>>>> > ideas in JIRA and put gsoc2017 as label). See messages on this
>>>>>>>> subject on
>>>>>>>> > the community.apache.org mailing list from Ulrich Stark.
>>>>>>>> > The following page also has useful info, even if it is not
>>>>>>>> updated for this
>>>>>>>> > year: http://community.apache.org/gsoc.html - mentors need to
>>>>>>>> register
>>>>>>>> > very
>>>>>>>> > soon.
>>>>>>>> >
>>>>>>>> > Best regards,
>>>>>>>> > Adina
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Sun, Mar 19, 2017 at 3:51 PM, Arvind Surve
>>>>>>>> <ac...@yahoo.com.invalid>
>>>>>>>> > wrote:
>>>>>>>> >
>>>>>>>> > > Thanks Krishna for your interest.
>>>>>>>> > > Unfortunately we could not submit topic to GSoc on time.However
>>>>>>>> please
>>>>>>>> > > feel free to leverage SystemML for your use cases and do
>>>>>>>> possible
>>>>>>>> > > contribution to SystemML.
>>>>>>>> > > Please let us know if you have any question.
>>>>>>>> > >
>>>>>>>> > > Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>>>>>> > >
>>>>>>>> > >       From: Krishna Kalyan <krishnakaly...@gmail.com>
>>>>>>>> > >  To: dev@systemml.incubator.apache.org
>>>>>>>> > >  Sent: Saturday, March 18, 2017 8:18 AM
>>>>>>>> > >  Subject: Re: GSoc 2017
>>>>>>>> > >
>>>>>>>> > > Hello All,
>>>>>>>> > > A Gentle ping. Student applications open in a couple of days. I
>>>>>>>> like to
>>>>>>>> > > work on 'Support for Python DSLs'.
>>>>>>>> > > However for now I am not sure on how to proceed.
>>>>>>>> > >
>>>>>>>> > > Thank you,
>>>>>>>> > > Krishna
>>>>>>>> > >
>>>>>>>> > > On Thu, Jan 12, 2017 at 6:08 PM, <dusenberr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> > >
>>>>>>>> > > > Yeah helping to build out our Python DSL into a full-out
>>>>>>>> replacement
>>>>>>>> > for
>>>>>>>> > > > the current "DML" language would be great, and we'd be quite
>>>>>>>> > supportive!
>>>>>>>> > > >
>>>>>>>> > > > -Mike
>>>>>>>> > > >
>>>>>>>> > > > --
>>>>>>>> > > >
>>>>>>>> > > > Mike Dusenberry
>>>>>>>> > > > GitHub: github.com/dusenberrymw
>>>>>>>> > > > LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>>> > > >
>>>>>>>> > > > Sent from my iPhone.
>>>>>>>> > > >
>>>>>>>> > > >
>>>>>>>> > > > > On Jan 12, 2017, at 2:58 PM, fschue...@posteo.de wrote:
>>>>>>>> > > > >
>>>>>>>> > > > > Hi Krishna,
>>>>>>>> > > > >
>>>>>>>> > > > > cool to see that you're interested in SystemML!
>>>>>>>> > > > >
>>>>>>>> > > > > From your list I personally think that a) and d) would be
>>>>>>>> well suited
>>>>>>>> > > > for projects, especially a good python DSL is a high priority.
>>>>>>>> > > > >
>>>>>>>> > > > > We will apply as an organization to GSoC once organization
>>>>>>>> > applications
>>>>>>>> > > > are open (Jan. 19th) and I think we will find mentors for at
>>>>>>>> least a)
>>>>>>>> > and
>>>>>>>> > > > d). If you already want to take a look at what is currently
>>>>>>>> there, I
>>>>>>>> > > > suggest to look at our python APIs and documentation. If you
>>>>>>>> want to
>>>>>>>> > take
>>>>>>>> > > > on the DSL project it might also be a good idea to look into
>>>>>>>> the DML
>>>>>>>> > > > documentation and related papers to see what we need to
>>>>>>>> support.
>>>>>>>> > > > >
>>>>>>>> > > > > The proposals will probably circulate on the mailinglist,
>>>>>>>> too, so
>>>>>>>> > keep
>>>>>>>> > > > an eye on that :)
>>>>>>>> > > > >
>>>>>>>> > > > > -Felix
>>>>>>>> > > > >
>>>>>>>> > > > > Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>>>>>>>> > > > >> Hello All,
>>>>>>>> > > > >> Thank you for your wonderful replies.
>>>>>>>> > > > >> Tasks that I am interested in:
>>>>>>>> > > > >> a) Support for Python DSLs
>>>>>>>> > > > >> b) Python wrappers for all existing algorithms
>>>>>>>> > > > >> c) GPU support
>>>>>>>> > > > >> d) Perftest : automated performance tests of algorithms
>>>>>>>> > > > >> I am also willing to work on the tasks that SystemML
>>>>>>>> community think
>>>>>>>> > > are
>>>>>>>> > > > >> important.
>>>>>>>> > > > >> Regards,
>>>>>>>> > > > >> Krishna
>>>>>>>> > > > >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <
>>>>>>>> > > > dusenberr...@gmail.com>
>>>>>>>> > > > >> wrote:
>>>>>>>> > > > >>> Hi Krishna!  Welcome, and thanks for your interest!
>>>>>>>> > > > >>> We would definitely be excited to collaborate with you on
>>>>>>>> a GSOC
>>>>>>>> > > > project.
>>>>>>>> > > > >>> We've started another thread to discuss possible new
>>>>>>>> proposals, and
>>>>>>>> > > we
>>>>>>>> > > > >>> would also be quite interested in any particular proposal
>>>>>>>> that you
>>>>>>>> > > > might
>>>>>>>> > > > >>> like to generate tailored towards your interests.  Copied
>>>>>>>> from the
>>>>>>>> > > > other
>>>>>>>> > > > >>> thread, some possible ideas could include: building out a
>>>>>>>> full ML
>>>>>>>> > > demo
>>>>>>>> > > > to
>>>>>>>> > > > >>> solve a real, large-scale problem that would benefit from
>>>>>>>> a
>>>>>>>> > > distributed
>>>>>>>> > > > >>> approach; overall performance improvements that address a
>>>>>>>> full
>>>>>>>> > class,
>>>>>>>> > > > or
>>>>>>>> > > > >>> wider area, of ML algorithms, rather than a single,
>>>>>>>> specific
>>>>>>>> > script;
>>>>>>>> > > > >>> infrastructure for [performance] testing, and
>>>>>>>> identification of
>>>>>>>> > wide
>>>>>>>> > > > areas
>>>>>>>> > > > >>> of improvement; helping with building out fully-featured,
>>>>>>>> clean,
>>>>>>>> > > > >>> well-tested DSLs in Python & Scala (we've started, but it
>>>>>>>> would be
>>>>>>>> > > > good to
>>>>>>>> > > > >>> continue stressing them -- we could even aim to replace
>>>>>>>> DML with
>>>>>>>> > the
>>>>>>>> > > > DSLs);
>>>>>>>> > > > >>> etc.  Overall, we want to improve the ability of the user
>>>>>>>> to work
>>>>>>>> > on
>>>>>>>> > > a
>>>>>>>> > > > wide
>>>>>>>> > > > >>> range of large-scale, distributed ML problems in a simple
>>>>>>>> and easy
>>>>>>>> > > > manner
>>>>>>>> > > > >>> on top of Spark.
>>>>>>>> > > > >>> In the meantime, you could explore our recent open issues
>>>>>>>> [1] and
>>>>>>>> > > even
>>>>>>>> > > > >>> begin discussions or contributions on any of the items.
>>>>>>>> You could
>>>>>>>> > > also
>>>>>>>> > > > >>> view our recent roadmap discussion thread on the mailing
>>>>>>>> list,
>>>>>>>> > > starting
>>>>>>>> > > > >>> with the first email [2]:
>>>>>>>> > > > >>> [1]:
>>>>>>>> > > > >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%
>>>>>>>> > > > 20SYSTEMML%20AND%
>>>>>>>> > > > >>> 20resolution%20%3D%20Unresolve
>>>>>>>> d%20ORDER%20BY%20updated%20DESC%2C%
>>>>>>>> > > > >>> 20priority%20DESC
>>>>>>>> > > > >>> [2]:
>>>>>>>> > > > >>> http://mail-archives.apache.org/mod_mbox/incubator-
>>>>>>>> > > > >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>>>>>>> > > > >>> bad740599...@gmail.com%3E
>>>>>>>> > > > >>> - Mike
>>>>>>>> > > > >>> --
>>>>>>>> > > > >>> Michael W. Dusenberry
>>>>>>>> > > > >>> GitHub: github.com/dusenberrymw
>>>>>>>> > > > >>> LinkedIn: linkedin.com/in/mikedusenberry
>>>>>>>> > > > >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <
>>>>>>>> > > luckbr1...@gmail.com
>>>>>>>> > > > >
>>>>>>>> > > > >>> wrote:
>>>>>>>> > > > >>> > As some folks have described on this thread, it would
>>>>>>>> be great to
>>>>>>>> > > > get you
>>>>>>>> > > > >>> > familiarized with SystemML.
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > In parallel, I would look for a mentor from the active
>>>>>>>> committer
>>>>>>>> > > > list and
>>>>>>>> > > > >>> > start working on a project proposal which could be
>>>>>>>> based on the
>>>>>>>> > > > recent
>>>>>>>> > > > >>> > Roadmap discussion [1].
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > If you are looking for some guidance on how Apache
>>>>>>>> participate on
>>>>>>>> > > > GSOC,
>>>>>>>> > > > >>> > take a look at the following resources [2] and [3], and
>>>>>>>> don't
>>>>>>>> > > > hesitate to
>>>>>>>> > > > >>> > ask questions here.
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > [1]
>>>>>>>> > > > >>> > https://www.mail-archive.com/d
>>>>>>>> ev@systemml.incubator.apache.o
>>>>>>>> > > > >>> > rg/msg01199.html
>>>>>>>> > > > >>> > [2] http://community.apache.org/gsoc.html
>>>>>>>> > > > >>> > [3]
>>>>>>>> > > > >>> > http://www.slideshare.net/luck
>>>>>>>> br1975/how-mentoring-can-help-
>>>>>>>> > > > >>> > you-start-contributing-to-open-source
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > On Thu, Jan 5, 2017 at 3:15 PM, Krishna Kalyan <
>>>>>>>> > > > krishnakaly...@gmail.com
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > wrote:
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > > Hello Developers,
>>>>>>>> > > > >>> > > I am Krishna, currently a 2nd year Masters student in
>>>>>>>> (MSc. in
>>>>>>>> > > Data
>>>>>>>> > > > >>> > Mining)
>>>>>>>> > > > >>> > > currently in Barcelona studying at Université
>>>>>>>> Polytechnique de
>>>>>>>> > > > >>> Catalogne.
>>>>>>>> > > > >>> > > I was interested in contributing to SystemML this
>>>>>>>> year under
>>>>>>>> > GSoc
>>>>>>>> > > > >>> > program.
>>>>>>>> > > > >>> > > Could anyone please guide on how to go about it?. (I
>>>>>>>> understand
>>>>>>>> > > > the I
>>>>>>>> > > > >>> > need
>>>>>>>> > > > >>> > > to write a proposal)
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> > > Related Experience:
>>>>>>>> > > > >>> > > My masters is mostly focussed on data mining
>>>>>>>> techniques. Before
>>>>>>>> > > my
>>>>>>>> > > > >>> > masters,
>>>>>>>> > > > >>> > > I was a  data engineer with IBM (India). I was
>>>>>>>> responsible for
>>>>>>>> > > > managing
>>>>>>>> > > > >>> > 50
>>>>>>>> > > > >>> > > node Hadoop Cluster for more than a year. Most of my
>>>>>>>> time was
>>>>>>>> > > spent
>>>>>>>> > > > >>> > > optimising and writing ETL (Apache Pig) jobs.
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> > > I am the most comfortable with Python followed by R
>>>>>>>> and Scala.
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> > > My Webpage
>>>>>>>> > > > >>> > > kkalyan.in
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> > > My Spark Pull Requests
>>>>>>>> > > > >>> > > https://github.com/apache/spar
>>>>>>>> k/pulls?utf8=%E2%9C%93&q=
>>>>>>>> > > > >>> is%3Apr%20author%
>>>>>>>> > > > >>> > > 3Akrishnakalyan3%20
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> > > Thank you so much,
>>>>>>>> > > > >>> > > Krishna
>>>>>>>> > > > >>> > >
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> >
>>>>>>>> > > > >>> > --
>>>>>>>> > > > >>> > Luciano Resende
>>>>>>>> > > > >>> > http://twitter.com/lresende1975
>>>>>>>> > > > >>> > http://lresende.blogspot.com/
>>>>>>>> > > > >>> >
>>>>>>>> > > >
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Dr. Adina Crainiceanu
>>>>>>>> > Associate Professor, Computer Science Department
>>>>>>>> > United States Naval Academy
>>>>>>>> > 410-293-6822
>>>>>>>> > ad...@usna.edu
>>>>>>>> > http://www.usna.edu/Users/cs/adina/
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to