Responding to your final remark: Yes. I'm going down that path. And if the labs PMC doesn't cotton to the idea, just s/labs/google code/ until things develop further.
On Fri, Nov 20, 2009 at 9:32 AM, Grant Ingersoll <[email protected]>wrote: > > On Nov 20, 2009, at 9:12 AM, Benson Margulies wrote: > > > Grant, > > > > I don't mean to belabor this, but I hate to have this public record of us > > misunderstanding each other quite so relentlessly. So I'm going to try > one > > more time to see if I can phrase my point of view in such a way that we > will > > be better aligned, and if I fail (or if we are indeed really poorly > aligned) > > ,then so be it. > > > > This starts with a question of the mission of Mahout, TLP or not. If the > > mission of Mahout is to focus on algorithms that are expressed as > map-reduce > > on Hadoop, then, honestly, I don't think this code belongs. I've studied > > this in depth (and done a weak implementation), Jethran's done two > > implementations, my friend and colleague Dr. Scott Miller has done a few, > > and none of us think that this algorithm is going to fit. > > Makes sense. I've always said Mahout is about (and I think others feel the > same): > 1. scalable Machine Learning - scale is open to interpretation. Some > algorithms simply do not work on M/R, as you point out, but we are still > interested in making them as fast as possible. This is well documented in > our archives. > 2. ASL > 3. Java (we have a Pig PLSI implementation that will likely be committed > soon, for instance) > > In the end, the only ones I feel strongly about are #1 and #2. Others may > feel differnt. Personally, I know my C++ skills are rusty, so I'm not a > likely contributor at the moment, but that shouldn't preclude others. I > just want to see Mahout help people solve Machine Learning problems in a > scalable, commercial friendly way. I don't particularly care what language > is used to achieve that assuming there are people to support it. > > > > > > If in addition, the project wants to stick with Java programs, even more > so. > > This particular algorithm is one in which none of us see a way to make > > map-reduce parallelism compensate for the fundamental limitations of Java > > floating point speed. There may be another way to cluster based on MI > that > > can exploit map-reduce, but this isn't it. Once I get the code posted > > somewhere, I'll let you all know where, and you are welcome to argue at > that > > point. > > > > My net impression is that the Mahout team might want to incorporate code > > that is outside the map-reduce corral, but is complementary to the broad > > mission of NLP algorithms, but that the team isn't excited about doing so > > right now. > > > That is a fair statement. In 6 mos. I could see there being an NLP > subproject to the Mahout TLP and this would fit there as a standalone > subproject, IMO. I certainly would love to see that. > > > > > Then comes the process issue. I will write at the outset that I was > making > > an incoherent and pretty unreasonable proposal about committer status. > > Because Java Map-Reduce technology is not applicable, at the moment, to > > things doing at our place of business, Jethran and I are not > well-positioned > > to pass through the standard procedure for earning committer status on > the > > project just now. > > Maybe. If you put up an initial patch and one us committed it, then your > patches on that would be how you would earn it. > > > It is true that other Apache projects have adopted > > committers in nonstandard ways, but, upon reflection, I don't see that as > a > > valid analogy to the situation at hand. If you are curious, I can fill > you > > in off-line as to the amusing tale of how I became a committer on > > WS-COMMONS. > > :-). I can't speak to other projects. I'm just basing it on how I've > viewed things as working in Lucene and the ASF. > > > > > > I confess that I'm puzzled about your comment about proxy commits. > Comitters > > commit other people's work from JIRAs constantly, so that can't be what > you > > are talking about. If the problem is someone misrepresenting work as > their > > own, then that wouldn't arise in this case. If I gave the impression that > I > > planned to mislead someone I apologize, I didn't mean to. In any case, I > > think the issue is moot, since I will explore what seems reasonable to > the > > labs with the labs PMC. > > > > My apologies. I misunderstood your intent. Not sure why I didn't give you > the benefit of the doubt knowing you know how all of this stuff works. > > So, how does this sound: > 1. Go to labs for now > 2. Keep an eye on us here and when we become a TLP, we'll reevaluate MI as > a subproject replete w/ its own committers and PMC representation? > > Cheers, > Grant
