On Nov 1, 2010, at 7:54 PM, Sean Owen wrote: > On Mon, Nov 1, 2010 at 10:59 PM, Grant Ingersoll <[email protected]>wrote: > >> I'm not sure I would word it that way. A few years is a long time. In the >> span of two years Lucene has seen improvements of upwards of 10-50x in terms >> of indexing and search speed. You simply never know where innovation is >> coming from. This is why you have branches such as trunk, 1.X, etc. and you >> back port, but to say we will be on 1.x for years to come is a very bad >> thing, IMO. >> > > Yes, and Lucene has always had a clear identity: it's a text indexing and > search engine.
Ah, but it is more than that, a lot more. I guess it depends on how you define the identity. For me, Mahout was always conceived as a place for machine learning algorithms that helped people solve real problems. So, us having some less used algorithms that could use some more polish isn't a bad thing. We just need to label them as such and encourage anyone who wants to pick them up to contribute back. > It set a clear expectation for what it is (and isn't) going > to do and then delivered. As much as there's a risk in defining a project's > scope and standard narrowly, there's risk in being too loose. I tend to > believe the latter is a slightly larger risk now. Would I rather have half > of 3 more algorithms, or more polish on 3 existing ones? More polish. I'm never going to argue against more polish, but I don't think the two are mutually exclusive. Polish doesn't happen overnight. People are free to contribute where they see fit. If you care about polish, then by all means polish. I'm thankful you want to polish. I like to polish sometimes and other times I like to do some basic piece of an implementation/algorithm that just might be a seed for someone else to get over a hump that they can then contribute back to on. One of the most innovative things that ever happened in Lucene happened in Lucene 2.3. A minor release. All the old capability was pretty well polished and "worked" for many, but the new, somewhat less polished stuff blew the old stuff away. It took 6-8 months to polish, but it was useful to a whole lot of people w/o the polish right away. Trunk users are very important for the life of a project. > By all > means, once things are polished, 1.0 is out, let's let anyone pile in > innovations for a loose road map for 1.1, 1.2, 2.0 -- a plan around which > organizations rather than individual hobbyists like me can rally and plan > and begin to depend. I think the free-for-all approach is just fine for 0.x > and think it's fine to stay in that mode as long as it takes -- it's kind of > the definition of "0.x" and I am trying to articulate what it is that "1.x" > means that's different. What is it? In my experience, planning in open source with developers who all work disparate hours and for disparate companies and disparate "itches" is very hard to do. Doing releases based on a feel is one thing, but saying what exactly will be in a release in a specific time period is much, much harder, especially given some larger amount of time. Of course we should try to coordinate, but you simply will be hard pressed to turn away good work, or even postpone good work simply because it isn't in some plan. Again, just look at the 2.3 release in Lucene. McCandless shows up one day and says "I have an idea for a 10-50x improvement in speed" and then goes about showing it. We'd have been stupid to turn it away or put it off for the next release. Organizations depend on open source when the open source has compelling features and polish that they can take advantage of, but you also have to keep in mind, especially with machine learning, that sometimes people just need a germ of an idea to go build from. Besides, the field is rapidly evolving. You can see this in the recommender space as well as all the other ones. In summary, I'm for the general notion of saying "we wish to travel in a northerly direction", but if we happen upon a nice restaurant on the way, let's stop and have a meal b/c we all know we gotta eat at some point in time, so it might as well be when we see a place we like. And, if we happen to end up traveling north east a little bit too, that's not a big deal either. > > > >> Hmmm. I hope no one just decides they think they know what they can throw >> away. I'm all for deprecation, but to me deprecation is about changes to >> APIs. I don't know that we should throw away algorithms. People can simply >> choose not to use them. Open source is evolutionary, not revolutionary. >> Sometimes it just takes a while for people to realize it is useful to them. >> Does that mean we should never throw things away? Of course not. It just >> means we need to think about and discuss it. >> > > Of course, nobody would delete things without discussion. I think an 'attic' > concept is fine for this too. I'm not talking about removing code because > it's old but possibly useful, but because it's not finished, documented, > tested, or consistent with newer code, and has no foreseeable hope of it. > Once we get to "1.0", everything is implicitly blessed as "all this code is > on purpose and we're going to support this for a while". I think we want to > be able to believe that by 1.0. Not meeting that promise has negative > consequence just as retiring something that someone might used sometime. > > > >> I don't agree we should "aggressively" turn away code. It simply isn't how >> open source works. Community over code. There is no crystal ball here and >> you simply never know where the next good idea is going to come from unless >> you let things ruminate. We may not commit it right away or we may >> encourage the contributor to flesh it out more, but turn away is not the >> right attitude, IMO. Open source is about scratching your itch and it's >> about innovation coming from the seeming middle of nowhere. Does that >> introduce some chaos? Yes. Does it make for better code in the long run? >> Absolutely. >> >> > I accept the point but want to argue the other side since I don't hear > enough of the counter-argument. > > Apache doesn't let anyone commit any code they like, community or no. So > there must be a point on the spectrum between accepting anything and > accepting nothing we have to find. I only happen to think we will need to > have a stronger bias towards wanting coherent, tested, documented code > coming in as the project evolves. Not now if you like -- but by "1.0", or > else what does that mean? > > Ruminations remain fine. We have patches and branches and still ample wiggle > room to commit and collaborate iteratively in HEAD between releases. Agreed. > > I just think you get what you ask for in a case like this. if bits of ideas > are accepted into the project, we'll end up with lots of people's bits. If > the bar is higher for quality and consistent, I believe people do match the > standard they see and hit that bar. We're already talking about people who > want to do what it takes to contribute something. Yes and no. The bar thing is tricky. You set it too high and you turn people/ideas away that can grow into more capabilities. However, I agree we can manage it. I just don't want it to be any rigid set of rules (not that you are proposing it.) > > Community is the reason I think this. I assert that a bit more standards, > review and roadmap actually attracts more community in the long run. I'm > thinking of big organizations. Can you picture your Twitters of the world > using this? They already are... ;-) > kind of, bits, in a maintained branch, with local modifications, > yes. (In fact I think we know of a few big organizations using it kind of > like this.) That's fine for now but something I think must change before you > can picture it being used as-is, for the most part. And that's the something > that's between here and 1.x that I'm trying to articulate. You will always have early adopters and you will always have "wait and see" approaches. We can keep both happy. We can manage this all through the notion of trunk and a stable branch approach. It's a pretty well-defined model at the ASF and elsewhere. People who care about polish work on the stable branch. People who care about innovation work on trunk. As the stuff on trunk matures it is either backported or spun off into the next stable branch. > > Otherwise I don't know what difference there is between 0.4, 0.5, 0.6, 1.0, > 2.0? It's both features and polish, but I see no reason why a particular version can't contain something that is officially released as experimental. Every piece of software has its dark areas, regardless of whether it is open or proprietary. To me, it is merely a labeling problem and not so much a problem of "it shouldn't be there"
