"Getting out of the comfort zone" sounds like a good thing for me too.
I'd like to help with refactoring and I'd also be willing to dig into
parts of Mahout I'm not yet familiar with.

I don't think we really need a formally defined process but I think more
discussion about the overall direction and "game plans" for releases
would be a very good thing to have, as well as a strategy for using JIRA
which we all agree on. I personally would favor Grant's strategy as it
seems to work well for Lucene.

I think we all are aware of loose ends and I'd like to throw some of
those into the discussion here (some of them have already been
mentioned), trying to provide constructive solutions.

I'm a fan of agile development too and although I don't think we need to
adapt it's formal roles, I think we should try to use one of its basic
principles: Implement the simplest solution that works.


My first question would be: What to do with algorithms that are not yet
stable and might even not be a good fit for the M/R paradigm? One idea
would be to annotate these algorithms to make their maturity visible,
another idea would be to create a contrib module and move'em there. I'd
be fine with either one, yet I'd favor the contrib module as I consider
it a cleaner solution and it would be a good starter place for new
contributions and contributors. I'd like to see ParallelALS and the
TriangleEnumeration stuff moved there. It would be a first good step
towards cleaning house, we should maybe also move watchmaker there, I
have the feeling nobody really sees it fit into core anymore.

We should try to find a way to include Hector Yee's patches. It's a pity
that we have not been able to include his work for so long now. I'd take
over the engineering part of committing them but someone has to review
the math behind it before. They could start their life in contrib too.

We also need to clean up AbstractJob and unify the way it's used by the
driver classes. Lots of algorithms use it in different and inconsistent
ways, I have not addressed this as someone was working on making all of
our jobsconfigurable and chainable, but I don't think there has been
much progress on this, so the easiest road to go here might be plain old
refactoring.

Another issue that needs attention and that has already been discussed
is input formats. We need to separate preprocessing from the algorithm
implementation. That is e.g. broken in ItemSimilarityJob and
RecommenderJob where both offer a myriad of parameters that have to be
kept consistent. Ideally we would have another job that takes the input
and vectorizes it, similar to the tools we already have for textual
data. I tried to start this with PreparePreferenceMatrixJob but I think
this needs more polishing too.

The last thing I see is that is of major importance is improving our
wiki. Trying to use our code without looking into the sources is a
pretty hard thing for a couple of algorithms. We should focus on
algorithms that are stable and mature. Ideally we would have an example
that people can work through, something like Grant's article showing how
to apply our three C's to the apache mail archives would be great.

--sebastian


On 26.10.2011 20:37, Jeff Eastman wrote:
> I don't imagine that any more work would get done and certainly any new 
> meta-work would need to produce concrete results. What might happen; however, 
> is that by focusing the 'cats' on a specific and clear set of JIRAs we could 
> end up with more coherent releases, better code knowledge by everybody and 
> improved quality due to more eyeballs on sections of our sprawling code base 
> in each release.
> 
> For me personally, spending some time in the classification or recommendation 
> code would be a good thing. It would be outside of my comfort zone initially 
> but I could become productive. If the focus of an epic would be to make some 
> concrete improvements in recommendation, for example, I'd need somebody like 
> you to break it down into bite-sized pieces but I would contribute. We have 
> already discussed further unification of classification and clustering: I 
> could help break that down into small stories that most developers could 
> tackle.
> 
> I'd like to get out of my comfort zone silo, maybe there are other 'cats' who 
> would like to get out of theirs too? On a scale of "let's do a few things 
> well" to "well, let's do a few more things" I think we are way trending to 
> the latter goal. I get that is your concern too. In my day jobs we use Agile 
> to focus our efforts towards the former goal. I'm just wondering if it would 
> work here too.
> 
> -----Original Message-----
> From: Sean Owen [mailto:[email protected]] 
> Sent: Wednesday, October 26, 2011 10:29 AM
> To: [email protected]
> Subject: Re: Improving Our JIRA State
> 
> It's all a fine idea in theory. There are already epic JIRAs out there
> though. I've already tried to organize without much effect. There are
> few 'cats' to herd out there (active committers). I don't think these
> are getting at the problem, which is quite simply big sprawling scope
> versus not enough hands willing to support it.
> 
> The level of interest in planning-to-do here is great, but it's just
> meta-work being done here, and we've had plenty of these chats before.
> They aren't real progress unfortunately. It would be great if there
> were more interest in doing, so we could tackle a larger scope. There
> isn't, so I am pretty certain the focus should be cutting down scope
> and repairing the things tht have already long been noted in JIRA.
> 
> That is -- there's a pretty clear to-do list not being done. One can
> say, let's talk about why it's not being done, let's form a new
> process, let's shuffle the papers, let's write new to-dos. Why would
> that not end up with another bigger to-do list? Why is more work going
> to get done?
> 
> 
> On Wed, Oct 26, 2011 at 5:42 PM, Jeff Eastman <[email protected]> wrote:
>> Changing the title, I'm tired of being "demoralized". I want to improve the 
>> state of our JIRAs and planning overall by building upon my previous remarks 
>> (cf. "RE: Demoralized over JIRA state" above).
>>
>> If we wanted to apply an Agile/Scrum process to Mahout development, we could:
>>
>> *        Identify a "Product Owner" to develop "epic" JIRAs to focus our 
>> development and to prioritize our backlog by quarterly release. Each release 
>> would then have a theme and would be a complete set of concrete enhancements 
>> with user-centric goals. Perhaps even a users@ member could take on this 
>> role, IMHO it does not need to be a developer, but someone who can work to 
>> establish and communicate a vision and a roadmap.
>>
>> *        Identify a "Scrum Master" to drive the creation of specific "story" 
>> JIRAs and guide development. This is a bit more like herding cats than 
>> managing people. This probably needs to be a committer as it has much more 
>> technical content and knowledge of the code base. We are all doing Mahout 
>> part-time so the schedule will be less predictable. But with Agile metrics 
>> computable if points are assigned to the stories we could at least measure 
>> our velocity quarterly.
>>
>> Burndowns anybody?
>> Jeff
>>
>>

Reply via email to