Re: Feedback on MLlib roadmap process proposal

Joseph Bradley Mon, 23 Jan 2017 17:04:04 -0800

Hi Seth,

The proposal is geared towards exactly the issue you're describing:
providing more visibility into the capacity and intentions of committers.
If there are things you'd add to it or change to improve further, it would
be great to hear ideas!  The past roadmap JIRA has some more background
discussion which is worth looking at too.


Let's break off the MLlib mission discussion into another thread.  I'll
start one now.

Thanks,
Joseph

On Thu, Jan 19, 2017 at 1:51 PM, Felix Cheung <[email protected]>
wrote:

> Hi Seth
>
> Re: "The most important thing we can do, given that MLlib currently has a
> very limited committer review bandwidth, is to make clear issues that, if
> worked on, will definitely get reviewed. "
>
> We are adopting a Shepherd model, as described in the JIRA Joseph has, in
> which, when assigned, the Shepherd will see it through with the contributor
> to make sure it lands with the target release.
>
> I'm sure Joseph can explain it better than I do ;)
>
>
> _____________________________
> From: Mingjie Tang <[email protected]>
> Sent: Thursday, January 19, 2017 10:30 AM
> Subject: Re: Feedback on MLlib roadmap process proposal
> To: Seth Hendrickson <[email protected]>
> Cc: Joseph Bradley <[email protected]>, <[email protected]>
>
>
>
> +1 general abstractions like distributed linear algebra.
>
> On Thu, Jan 19, 2017 at 8:54 AM, Seth Hendrickson <
> [email protected]> wrote:
>
>> I think the proposal laid out in SPARK-18813 is well done, and I do think
>> it is going to improve the process going forward. I also really like the
>> idea of getting the community to vote on JIRAs to give some of them
>> priority - provided that we listen to those votes, of course. The biggest
>> problem I see is that we do have several active contributors and those who
>> want to help implement these changes, but PRs are reviewed rather
>> sporadically and I imagine it is very difficult for contributors to
>> understand why some get reviewed and some do not. The most important thing
>> we can do, given that MLlib currently has a very limited committer review
>> bandwidth, is to make clear issues that, if worked on, will definitely get
>> reviewed. A hard thing to do in open source, no doubt, but even if we have
>> to limit the scope of such issues to a very small subset, it's a gain for
>> all I think.
>>
>> On a related note, I would love to hear some discussion on the higher
>> level goal of Spark MLlib (if this derails the original discussion, please
>> let me know and we can discuss in another thread). The roadmap does contain
>> specific items that help to convey some of this (ML parity with MLlib,
>> model persistence, etc...), but I'm interested in what the "mission" of
>> Spark MLlib is. We often see PRs for brand new algorithms which are
>> sometimes rejected and sometimes not. Do we aim to keep implementing more
>> and more algorithms? Or is our focus really, now that we have a reasonable
>> library of algorithms, to simply make the existing ones faster/better/more
>> robust? Should we aim to make interfaces that are easily extended for
>> developers to easily implement their own custom code (e.g. custom
>> optimization libraries), or do we want to restrict things to out-of-the box
>> algorithms? Should we focus on more flexible, general abstractions like
>> distributed linear algebra?
>>
>> I was not involved in the project in the early days of MLlib when this
>> discussion may have happened, but I think it would be useful to either
>> revisit it or restate it here for some of the newer developers.
>>
>> On Tue, Jan 17, 2017 at 3:38 PM, Joseph Bradley <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> This is a general call for thoughts about the process for the MLlib
>>> roadmap proposed in SPARK-18813.  See the section called "Roadmap process."
>>>
>>> Summary:
>>> * This process is about committers indicating intention to shepherd and
>>> review.
>>> * The goal is to improve visibility and communication.
>>> * This is fairly orthogonal to the SIP discussion since this proposal is
>>> more about setting release targets than about proposing future plans.
>>>
>>> Thanks!
>>> Joseph
>>>
>>> --
>>>
>>> Joseph Bradley
>>>
>>> Software Engineer - Machine Learning
>>>
>>> Databricks, Inc.
>>>
>>> [image: http://databricks.com] <http://databricks.com/>
>>>
>>
>>
>
>
>


-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] <http://databricks.com/>

Re: Feedback on MLlib roadmap process proposal

Reply via email to