On Nov 20, 2009, at 9:12 AM, Benson Margulies wrote: > Grant, > > I don't mean to belabor this, but I hate to have this public record of us > misunderstanding each other quite so relentlessly. So I'm going to try one > more time to see if I can phrase my point of view in such a way that we will > be better aligned, and if I fail (or if we are indeed really poorly aligned) > ,then so be it. > > This starts with a question of the mission of Mahout, TLP or not. If the > mission of Mahout is to focus on algorithms that are expressed as map-reduce > on Hadoop, then, honestly, I don't think this code belongs. I've studied > this in depth (and done a weak implementation), Jethran's done two > implementations, my friend and colleague Dr. Scott Miller has done a few, > and none of us think that this algorithm is going to fit.
Makes sense. I've always said Mahout is about (and I think others feel the same): 1. scalable Machine Learning - scale is open to interpretation. Some algorithms simply do not work on M/R, as you point out, but we are still interested in making them as fast as possible. This is well documented in our archives. 2. ASL 3. Java (we have a Pig PLSI implementation that will likely be committed soon, for instance) In the end, the only ones I feel strongly about are #1 and #2. Others may feel differnt. Personally, I know my C++ skills are rusty, so I'm not a likely contributor at the moment, but that shouldn't preclude others. I just want to see Mahout help people solve Machine Learning problems in a scalable, commercial friendly way. I don't particularly care what language is used to achieve that assuming there are people to support it. > > If in addition, the project wants to stick with Java programs, even more so. > This particular algorithm is one in which none of us see a way to make > map-reduce parallelism compensate for the fundamental limitations of Java > floating point speed. There may be another way to cluster based on MI that > can exploit map-reduce, but this isn't it. Once I get the code posted > somewhere, I'll let you all know where, and you are welcome to argue at that > point. > > My net impression is that the Mahout team might want to incorporate code > that is outside the map-reduce corral, but is complementary to the broad > mission of NLP algorithms, but that the team isn't excited about doing so > right now. That is a fair statement. In 6 mos. I could see there being an NLP subproject to the Mahout TLP and this would fit there as a standalone subproject, IMO. I certainly would love to see that. > > Then comes the process issue. I will write at the outset that I was making > an incoherent and pretty unreasonable proposal about committer status. > Because Java Map-Reduce technology is not applicable, at the moment, to > things doing at our place of business, Jethran and I are not well-positioned > to pass through the standard procedure for earning committer status on the > project just now. Maybe. If you put up an initial patch and one us committed it, then your patches on that would be how you would earn it. > It is true that other Apache projects have adopted > committers in nonstandard ways, but, upon reflection, I don't see that as a > valid analogy to the situation at hand. If you are curious, I can fill you > in off-line as to the amusing tale of how I became a committer on > WS-COMMONS. :-). I can't speak to other projects. I'm just basing it on how I've viewed things as working in Lucene and the ASF. > > I confess that I'm puzzled about your comment about proxy commits. Comitters > commit other people's work from JIRAs constantly, so that can't be what you > are talking about. If the problem is someone misrepresenting work as their > own, then that wouldn't arise in this case. If I gave the impression that I > planned to mislead someone I apologize, I didn't mean to. In any case, I > think the issue is moot, since I will explore what seems reasonable to the > labs with the labs PMC. > My apologies. I misunderstood your intent. Not sure why I didn't give you the benefit of the doubt knowing you know how all of this stuff works. So, how does this sound: 1. Go to labs for now 2. Keep an eye on us here and when we become a TLP, we'll reevaluate MI as a subproject replete w/ its own committers and PMC representation? Cheers, Grant
