+1 general abstractions like distributed linear algebra. On Thu, Jan 19, 2017 at 8:54 AM, Seth Hendrickson < seth.hendrickso...@gmail.com> wrote:
> I think the proposal laid out in SPARK-18813 is well done, and I do think > it is going to improve the process going forward. I also really like the > idea of getting the community to vote on JIRAs to give some of them > priority - provided that we listen to those votes, of course. The biggest > problem I see is that we do have several active contributors and those who > want to help implement these changes, but PRs are reviewed rather > sporadically and I imagine it is very difficult for contributors to > understand why some get reviewed and some do not. The most important thing > we can do, given that MLlib currently has a very limited committer review > bandwidth, is to make clear issues that, if worked on, will definitely get > reviewed. A hard thing to do in open source, no doubt, but even if we have > to limit the scope of such issues to a very small subset, it's a gain for > all I think. > > On a related note, I would love to hear some discussion on the higher > level goal of Spark MLlib (if this derails the original discussion, please > let me know and we can discuss in another thread). The roadmap does contain > specific items that help to convey some of this (ML parity with MLlib, > model persistence, etc...), but I'm interested in what the "mission" of > Spark MLlib is. We often see PRs for brand new algorithms which are > sometimes rejected and sometimes not. Do we aim to keep implementing more > and more algorithms? Or is our focus really, now that we have a reasonable > library of algorithms, to simply make the existing ones faster/better/more > robust? Should we aim to make interfaces that are easily extended for > developers to easily implement their own custom code (e.g. custom > optimization libraries), or do we want to restrict things to out-of-the box > algorithms? Should we focus on more flexible, general abstractions like > distributed linear algebra? > > I was not involved in the project in the early days of MLlib when this > discussion may have happened, but I think it would be useful to either > revisit it or restate it here for some of the newer developers. > > On Tue, Jan 17, 2017 at 3:38 PM, Joseph Bradley <jos...@databricks.com> > wrote: > >> Hi all, >> >> This is a general call for thoughts about the process for the MLlib >> roadmap proposed in SPARK-18813. See the section called "Roadmap process." >> >> Summary: >> * This process is about committers indicating intention to shepherd and >> review. >> * The goal is to improve visibility and communication. >> * This is fairly orthogonal to the SIP discussion since this proposal is >> more about setting release targets than about proposing future plans. >> >> Thanks! >> Joseph >> >> -- >> >> Joseph Bradley >> >> Software Engineer - Machine Learning >> >> Databricks, Inc. >> >> [image: http://databricks.com] <http://databricks.com/> >> > >