Re: MI clustering

Benson Margulies Fri, 20 Nov 2009 06:44:56 -0800

Responding to your final remark: Yes. I'm going down that path. And if the
labs PMC doesn't cotton to the idea, just s/labs/google code/ until things
develop further.



On Fri, Nov 20, 2009 at 9:32 AM, Grant Ingersoll <[email protected]>wrote:

>
> On Nov 20, 2009, at 9:12 AM, Benson Margulies wrote:
>
> > Grant,
> >
> > I don't mean to belabor this, but I hate to have this public record of us
> > misunderstanding each other quite so relentlessly. So I'm going to try
> one
> > more time to see if I can phrase my point of view in such a way that we
> will
> > be better aligned, and if I fail (or if we are indeed really poorly
> aligned)
> > ,then so be it.
> >
> > This starts with a question of the mission of Mahout, TLP or not. If the
> > mission of Mahout is to focus on algorithms that are expressed as
> map-reduce
> > on Hadoop, then, honestly, I don't think this code belongs. I've studied
> > this in depth (and done a weak implementation), Jethran's done two
> > implementations, my friend and colleague Dr. Scott Miller has done a few,
> > and none of us think that this algorithm is going to fit.
>
> Makes sense.  I've always said Mahout is about (and I think others feel the
> same):
> 1. scalable Machine Learning - scale is open to interpretation.  Some
> algorithms simply do not work on M/R, as you point out, but we are still
> interested in making them as fast as possible.  This is well documented in
> our archives.
> 2.  ASL
> 3. Java (we have a Pig PLSI implementation that will likely be committed
> soon, for instance)
>
> In the end, the only ones I feel strongly about are #1 and #2.  Others may
> feel differnt.  Personally, I know my C++ skills are rusty, so I'm not a
> likely contributor at the moment, but that shouldn't preclude others.  I
> just want to see Mahout help people solve Machine Learning problems in a
> scalable, commercial friendly way.  I don't particularly care what language
> is used to achieve that assuming there are people to support it.
>
>
> >
> > If in addition, the project wants to stick with Java programs, even more
> so.
> > This particular algorithm is one in which none of us see a way to make
> > map-reduce parallelism compensate for the fundamental limitations of Java
> > floating point speed. There may be another way to cluster based on MI
> that
> > can exploit map-reduce, but this isn't it. Once I get the code posted
> > somewhere, I'll let you all know where, and you are welcome to argue at
> that
> > point.
> >
> > My net impression is that the Mahout team might want to incorporate code
> > that is outside the map-reduce corral, but is complementary to the broad
> > mission of NLP algorithms, but that the team isn't excited about doing so
> > right now.
>
>
> That is a fair statement.  In 6 mos. I could see there being an NLP
> subproject to the Mahout TLP and this would fit there as a standalone
> subproject, IMO.  I certainly would love to see that.
>
> >
> > Then comes the process issue. I will write at the outset that I was
> making
> > an incoherent and pretty unreasonable proposal about committer status.
> > Because Java Map-Reduce technology is not applicable, at the moment, to
> > things doing at our place of business, Jethran and I are not
> well-positioned
> > to pass through the standard procedure for earning committer status on
> the
> > project just now.
>
> Maybe.  If you put up an initial patch and one us committed it, then your
> patches on that would be how you would earn it.
>
> > It is true that other Apache projects have adopted
> > committers in nonstandard ways, but, upon reflection, I don't see that as
> a
> > valid analogy to the situation at hand. If you are curious, I can fill
> you
> > in off-line as to the amusing tale of how I became a committer on
> > WS-COMMONS.
>
> :-).  I can't speak to other projects.  I'm just basing it on how I've
> viewed things as working in Lucene and the ASF.
>
>
> >
> > I confess that I'm puzzled about your comment about proxy commits.
> Comitters
> > commit other people's work from JIRAs constantly, so that can't be what
> you
> > are talking about. If the problem is someone misrepresenting work as
> their
> > own, then that wouldn't arise in this case. If I gave the impression that
> I
> > planned to mislead someone I apologize, I didn't mean to. In any case, I
> > think the issue is moot, since I will explore what seems reasonable to
> the
> > labs with the labs PMC.
> >
>
> My apologies.  I misunderstood your intent.  Not sure why I didn't give you
> the benefit of the doubt knowing you know how all of this stuff works.
>
> So, how does this sound:
> 1. Go to labs for now
> 2. Keep an eye on us here and when we become a TLP, we'll reevaluate MI as
> a subproject replete w/ its own committers and PMC representation?
>
> Cheers,
> Grant

Re: MI clustering

Reply via email to