+1 general abstractions like distributed linear algebra.

On Thu, Jan 19, 2017 at 8:54 AM, Seth Hendrickson <
seth.hendrickso...@gmail.com> wrote:

> I think the proposal laid out in SPARK-18813 is well done, and I do think
> it is going to improve the process going forward. I also really like the
> idea of getting the community to vote on JIRAs to give some of them
> priority - provided that we listen to those votes, of course. The biggest
> problem I see is that we do have several active contributors and those who
> want to help implement these changes, but PRs are reviewed rather
> sporadically and I imagine it is very difficult for contributors to
> understand why some get reviewed and some do not. The most important thing
> we can do, given that MLlib currently has a very limited committer review
> bandwidth, is to make clear issues that, if worked on, will definitely get
> reviewed. A hard thing to do in open source, no doubt, but even if we have
> to limit the scope of such issues to a very small subset, it's a gain for
> all I think.
>
> On a related note, I would love to hear some discussion on the higher
> level goal of Spark MLlib (if this derails the original discussion, please
> let me know and we can discuss in another thread). The roadmap does contain
> specific items that help to convey some of this (ML parity with MLlib,
> model persistence, etc...), but I'm interested in what the "mission" of
> Spark MLlib is. We often see PRs for brand new algorithms which are
> sometimes rejected and sometimes not. Do we aim to keep implementing more
> and more algorithms? Or is our focus really, now that we have a reasonable
> library of algorithms, to simply make the existing ones faster/better/more
> robust? Should we aim to make interfaces that are easily extended for
> developers to easily implement their own custom code (e.g. custom
> optimization libraries), or do we want to restrict things to out-of-the box
> algorithms? Should we focus on more flexible, general abstractions like
> distributed linear algebra?
>
> I was not involved in the project in the early days of MLlib when this
> discussion may have happened, but I think it would be useful to either
> revisit it or restate it here for some of the newer developers.
>
> On Tue, Jan 17, 2017 at 3:38 PM, Joseph Bradley <jos...@databricks.com>
> wrote:
>
>> Hi all,
>>
>> This is a general call for thoughts about the process for the MLlib
>> roadmap proposed in SPARK-18813.  See the section called "Roadmap process."
>>
>> Summary:
>> * This process is about committers indicating intention to shepherd and
>> review.
>> * The goal is to improve visibility and communication.
>> * This is fairly orthogonal to the SIP discussion since this proposal is
>> more about setting release targets than about proposing future plans.
>>
>> Thanks!
>> Joseph
>>
>> --
>>
>> Joseph Bradley
>>
>> Software Engineer - Machine Learning
>>
>> Databricks, Inc.
>>
>> [image: http://databricks.com] <http://databricks.com/>
>>
>
>

Reply via email to