On Nov 1, 2010, at 7:54 PM, Sean Owen wrote:

> On Mon, Nov 1, 2010 at 10:59 PM, Grant Ingersoll <[email protected]>wrote:
> 
>> I'm not sure I would word it that way.  A few years is a long time.  In the
>> span of two years Lucene has seen improvements of upwards of 10-50x in terms
>> of indexing and search speed.  You simply never know where innovation is
>> coming from.  This is why you have branches such as trunk, 1.X, etc. and you
>> back port, but to say we will be on 1.x for years to come is a very bad
>> thing, IMO.
>> 
> 
> Yes, and Lucene has always had a clear identity: it's a text indexing and
> search engine.

Ah, but it is more than that, a lot more.  I guess it depends on how you define 
the identity.  For me, Mahout was always conceived as a place for machine 
learning algorithms that helped people solve real problems.  So, us having some 
less used algorithms that could use some more polish isn't a bad thing.  We 
just need to label them as such and encourage anyone who wants to pick them up 
to contribute back.

> It set a clear expectation for what it is (and isn't) going
> to do and then delivered. As much as there's a risk in defining a project's
> scope and standard narrowly, there's risk in being too loose. I tend to
> believe the latter is a slightly larger risk now. Would I rather have half
> of 3 more algorithms, or more polish on 3 existing ones? More polish.

I'm never going to argue against more polish, but I don't think the two are 
mutually exclusive.  Polish doesn't happen overnight.  People are free to 
contribute where they see fit.  If you care about polish, then by all means 
polish.  I'm thankful you want to polish.  I like to polish sometimes and other 
times I like to do some basic piece of an implementation/algorithm that just 
might be a seed for someone else to get over a hump that they can then 
contribute back to on.

One of the most innovative things that ever happened in Lucene happened in 
Lucene 2.3. A minor release.  All the old capability was pretty well polished 
and "worked" for many, but the new, somewhat less polished stuff blew the old 
stuff away.  It took 6-8 months to polish, but it was useful to a whole lot of 
people w/o the polish right away.  Trunk users are very important for the life 
of a project.

> By all
> means, once things are polished, 1.0 is out, let's let anyone pile in
> innovations for a loose road map for 1.1, 1.2, 2.0 -- a plan around which
> organizations rather than individual hobbyists like me can rally and plan
> and begin to depend. I think the free-for-all approach is just fine for 0.x
> and think it's fine to stay in that mode as long as it takes -- it's kind of
> the definition of "0.x" and I am trying to articulate what it is that "1.x"
> means that's different. What is it?

In my experience, planning in open source with developers who all work 
disparate hours and for disparate companies and disparate "itches" is very hard 
to do.  Doing releases based on a feel is one thing, but saying what exactly 
will be in a release in a specific time period is much, much harder, especially 
given some larger amount of time.  Of course we should try to coordinate, but 
you simply will be hard pressed to turn away good work, or even postpone good 
work simply because it isn't in some plan.  Again, just look at the 2.3 release 
in Lucene.  McCandless shows up one day and says "I have an idea for a 10-50x 
improvement in speed" and then goes about showing it.  We'd have been stupid to 
turn it away or put it off for the next release.   Organizations depend on open 
source when the open source has compelling features and polish that they can 
take advantage of, but you also have to keep in mind, especially with machine 
learning, that sometimes people just need a germ of an idea to go build from.   
Besides, the field is rapidly evolving.  You can see this in the recommender 
space as well as all the other ones.

In summary, I'm for the general notion of saying "we wish to travel in a 
northerly direction", but if we happen upon a nice restaurant on the way, let's 
stop and have a meal b/c we all know we gotta eat at some point in time, so it 
might as well be when we see a place we like.  And, if we happen to end up 
traveling north east a little bit too, that's not a big deal either.

> 
> 
> 
>> Hmmm.  I hope no one just decides they think they know what they can throw
>> away.  I'm all for deprecation, but to me deprecation is about changes to
>> APIs.  I don't know that we should throw away algorithms.  People can simply
>> choose not to use them.  Open source is evolutionary, not revolutionary.
>> Sometimes it just takes a while for people to realize it is useful to them.
>> Does that mean we should never throw things away?  Of course not.  It just
>> means we need to think about and discuss it.
>> 
> 
> Of course, nobody would delete things without discussion. I think an 'attic'
> concept is fine for this too. I'm not talking about removing code because
> it's old but possibly useful, but because it's not finished, documented,
> tested, or consistent with newer code, and has no foreseeable hope of it.
> Once we get to "1.0", everything is implicitly blessed as "all this code is
> on purpose and we're going to support this for a while". I think we want to
> be able to believe that by 1.0. Not meeting that promise has negative
> consequence just as retiring something that someone might used sometime.
> 
> 
> 
>> I don't agree we should "aggressively" turn away code.  It simply isn't how
>> open source works.  Community over code.  There is no crystal ball here and
>> you simply never know where the next good idea is going to come from unless
>> you let things ruminate.  We may not commit it right away or we may
>> encourage the contributor to flesh it out more, but turn away is not the
>> right attitude, IMO.  Open source is about scratching your itch and it's
>> about innovation coming from the seeming middle of nowhere.  Does that
>> introduce some chaos?  Yes.  Does it make for better code in the long run?
>> Absolutely.
>> 
>> 
> I accept the point but want to argue the other side since I don't hear
> enough of the counter-argument.
> 
> Apache doesn't let anyone commit any code they like, community or no. So
> there must be a point on the spectrum between accepting anything and
> accepting nothing we have to find. I only happen to think we will need to
> have a stronger bias towards wanting coherent, tested, documented code
> coming in as the project evolves. Not now if you like -- but by "1.0", or
> else what does that mean?
> 
> Ruminations remain fine. We have patches and branches and still ample wiggle
> room to commit and collaborate iteratively in HEAD between releases.

Agreed.

> 
> I just think you get what you ask for in a case like this. if bits of ideas
> are accepted into the project, we'll end up with lots of people's bits. If
> the bar is higher for quality and consistent, I believe people do match the
> standard they see and hit that bar. We're already talking about people who
> want to do what it takes to contribute something.

Yes and no.  The bar thing is tricky.  You set it too high and you turn 
people/ideas away that can grow into more capabilities.    However, I agree we 
can manage it.  I just don't want it to be any rigid set of rules (not that you 
are proposing it.)

> 
> Community is the reason I think this. I assert that a bit more standards,
> review and roadmap actually attracts more community in the long run. I'm
> thinking of big organizations. Can you picture your Twitters of the world
> using this?

They already are... ;-)

> kind of, bits, in a maintained branch, with local modifications,
> yes. (In fact I think we know of a few big organizations using it kind of
> like this.) That's fine for now but something I think must change before you
> can picture it being used as-is, for the most part. And that's the something
> that's between here and 1.x that I'm trying to articulate.

You will always have early adopters and you will always have "wait and see" 
approaches.  We can keep both happy.  We can manage this all through the notion 
of trunk and a stable branch approach.  It's a pretty well-defined model at the 
ASF and elsewhere.  People who care about polish work on the stable branch.  
People who care about innovation work on trunk.  As the stuff on trunk matures 
it is either backported or spun off into the next stable branch.

> 
> Otherwise I don't know what difference there is between 0.4, 0.5, 0.6, 1.0,
> 2.0?

It's both features and polish, but I see no reason why a particular version 
can't contain something that is officially released as experimental.  Every 
piece of software has its dark areas, regardless of whether it is open or 
proprietary.  To me, it is merely a labeling problem and not so much a problem 
of "it shouldn't be there"


Reply via email to