The debate above seems pretty complete.

What are positive actions that will make Mahout healthier?

Suggestions from debate:
* Automated patch testing. This would cure 'rotting patch' problem.
* Chivvying contributors for detailed notes.
* ?

Personal concepts:
* Regression suite with real data. There have been cases of "Three Cs"
batch jobs slowly or quickly drifting from good outputs. A separate
suite which exercises the various algorithms with real data would help
catch these.
* Regression suite of the Mahout In Action code. Books really help a
project, and their code goes stale. Some way to keep the MIA examples
fresh.



On 10/22/11, Dmitriy Lyubimov <[email protected]> wrote:
> I feel like I am most closely aligned with Grant. Very little to add.
>
> Like it or not, Mahout is a library, not a coherent product such as hbase.
> It's a collection of algorithms connectied together with some fairly thin
> structure and persitence glue, but thenglue rarely can go much beyond that.
> That naturally presents difficulties with support as not every committer is
> broadly qualified to advise on any algorithm (as opposed for example to
> hbase which is pretty much a single product and therefore is much easier to
> gain proficiency in).
>
> If we look around at ml projects, e.g. bugs, wopal wabbit, libsvn, they all
> seem to revolve around single area of ml. Hence they get support in that
> area. There are few exceptions like weka but they revolve around "non-big'
> data and therefore use well known approaches whereas Mahout almost always
> requires an added value to make a method scalable. That added value is
> rarely resulting in a published paper or even descently reviewed working
> notes, which makes support of the thing even more difficult.
>
> Hence, few thoughts.
> 1 request and review more or less detailed working notes from the
> contrjbutor before he vanishes from radar.
>
> 2 don't get upset by multiplicity of open jiras. If jjra sits around and not
> fixed for the upcoming release, just create a special 'backlog' fix target
> and throw it there until the author provides more information.
>
> 3 I suggest review some contributions from practicality point of view. I.e.
> if the author had concrete need for his contribution and was using it
> himself, take more favourable view of it. It would result in majority of
> contributions being focused on most common pragmatic need, rather than being
> a technology in a search of a problem. (That's btw how my code got evolved,
> I coded it not because I had an itch, but because I needed Mr based lsa
> solution). In other words, pragmatically necessary things tend to get more
> chance of being finished and improved upon naturally. But they still may
> take months and even years to evolve to a nicely optimized  solution, so no
> need to nix something right away. Just throw it in backlog, and unless
> author does not reappear in as much as 18 months, dont nix it, just let it
> sit in backlog limbo. These things often don't come up easy.(to me anyway).
>
> 4 even though we may not fully understand the method, we still may create
> some standard requirements for the contributions. I already mentioned
> working notes. But we may also ask to define standard characteristics, such
> as number of Map reduce iterations required, parallelization sttategy,
> flops. It would be ideal if we could also find a way to do and publish a
> standard benchmarks on say 10G input just to see if it smells. It would help
> (me at least) if this data along with maturity level were published in wiki.
> Also request a method tutorial from the contributor written to wiki.
> On Oct 22, 2011 10:36 AM, "Benson Margulies" <[email protected]> wrote:
>
>> Drat: I wrote 'is necessarily a badge of shame' when I meant to write
>> 'is not necessarily a badge of shame'.
>>
>


-- 
Lance Norskog
[email protected]

Reply via email to