+1 We should definitely submit a few good project proposals, and particularly those that aim to improve the ability of the user to work on a wide range of ML problems in a simple and easy manner on top of Spark. This could include: building out a full ML demo to solve a real, large-scale problem that would benefit from a distributed approach; overall performance improvements that address a full class, or wider area, of ML algorithms, rather than a single, specific script; infrastructure for [performance] testing, and identification of wide areas of improvement (your example proposal fits here, and is quite nice!); helping with building out fully-featured, clean, well-tested DSLs in Python & Scala (we've started, but it would be good to continue stressing them -- we could even aim to replace DML with the DSLs); etc. I like the example proposal that you've given since it would be beneficial to the entire project, rather than a single, isolated area.
- Mike -- Michael W. Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry On Fri, Jan 6, 2017 at 11:57 AM, Madison Myers <madisonjmy...@gmail.com> wrote: > +1 I think it's a great idea, Felix > > On Fri, Jan 6, 2017 at 11:54 AM, <fschue...@posteo.de> wrote: > > > Hi all, > > > > as it just came up on the ML, I want to bring this up again for general > > discussion. I think we should try to get at least one or two students for > > this year's GSOC. If you have never heard of GSOC, look here: > > http://write.flossmanuals.net/gsoc-mentoring/what-is-gsoc/ and here: > > https://developers.google.com/open-source/gsoc/ > > > > Applications for organizations open on January 19th and it is a great way > > of introducing new people to the SystemML development and get more > > contributors. > > To apply, we need to propose projects for a 4-month period in which a > > student works on them full time (May - August). Each proposed project > needs > > one community member to mentor it - in the end Google decides how many > > students each project gets, depending of the quality of the proposed > ideas. > > To successfully apply we need (1) good ideas for projects and (2) people > > willing to mentor those ideas. > > For an initial brainstorming I suggest that we first figure out if we > want > > to participate (which mainly means we need to find people willing to > mentor > > projects) and then start collecting ideas. Ideas can be anything from > > infrastructure, to core development or implementation of new algorithms. > > > > Here is a quick example of how a project proposal could look like: > > > > > > Title: Performance Benchmarks and Experiments > > > > Description: To make decisions about new features and the evaluation of > > old assumptions we need up-to-date performance statistics on multiple > > levels of the systems and on different architectures (local, distributed, > > GPU). The systematic evaluation of performance can be measured with > > performance tests and micro-benchmarks. In this way, changes to the > project > > or alternative implementations (i.g. for low-level linear algebra > backends) > > can be systematically evaluated and compared. (Semi-) Automated > benchmarks > > can help make these decisions and challenge assumptions that were made > > during earlier development. In the course of this project, the student > > should build a benchmark infrastructure and conduct experiments, that > > compare different choices in critical parts (sparsity thresholds, BLAS > > backends, optimization decisions, etc.). > > > > Expected Outcome: A benchmark suite than can be used to detect > regressions > > or improvements in critical components of the system. > > > > Skills required: Java/Scala, some knowledge of benchmarking; preferred: > > knowledge about high-performance-computing and/or distributed systems. > > > > Possible Mentors: Matthias, Niketan, Nakul, Felix > > > > > > Let's decide on if we want to apply as an organization! > > > > - Felix > > > > > > -- > *Madison J. Myers* > *--------------------------* > *Spark Technology Center, IBM Watson* > *UC Berkeley, Master of Information & Data Science '17* > > *King's College London, MA Political Science '14* > *New York University, BA Political Science '12* > > - > LinkedIn <http://linkedin.com/in/madisonjmyers> >