To put things in perspective, I believe Microsoft (who could potentially place a lot of resources towards Lucene) now uses Lucene through Powerset? and I don't think those folks are contributing back. I know of several other companies who do the same, and many potential contributions that are not submitted because people and their companies do not see the benefit of going through the hoops required to get patches committed. A relatively simple patch such as 1473 Serialization represents this well.
For example if a company is developing custom search algorithms, Lucene supports TF/IDF but not much else. Custom search algorithms require rewriting lots of Lucene code. Companies who write new search algorithms do not necessarily want to rewrite Lucene as well to make it pluggable for new scoring as it is out of scope, they will simply branch the code. It does not help that the core APIs underneath IndexReader are protected and package protected which assumes a user that is not advanced. It is repeated in the mailing lists that new features will threaten the existing user base which is based on opinion rather than fact. More advanced users are currently hindered by the conservatism of the project and so naturally have stopped trying to submit changes that alter the core non-public code. The rancor is from users would benefit from a faster pace and the ability to be more creative inside the core Lucene system. As the internals change frequently and unnannounced the process of developing core patches is difficult and frustrating. Now that Lucene is stable and flexible indexing is being implemented. It would benefit the community to focus on the future. Who exactly is responsible for this? Which of the committers are building for the future? Which are doing bug fixes? What is the process of developing more advanced features in open source? Right now it seems to be one person, Michael McCandless developing all of the new core code. This is great forward progress, however it's unclear how others can get involved and not get stampeded by the constant changes that all happen via one brilliant person. I have requested of people such as Michael Busch to collaborate on the column stride fields and received no response. To me, an good example of volunteers are people who prepare food and donate their time at soup kitchens with no pay, and no hope for pay related to feeding the hungry. -J On Wed, Dec 3, 2008 at 2:52 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > On Dec 3, 2008, at 2:27 PM, Jason Rutherglen (JIRA) wrote: > > >> >> Hoss wrote: "sort of mythical "Lucene powerhouse" >> Lucene seems to run itself quite differently than other open source Java >> projects. Perhaps it would be good to spell out the reasons for the >> reluctance to move ahead with features that developers work on, that work, >> but do not go in. The developer contributions seem to be quite low right >> now, especially compared to neighbor projects such as Hadoop. Is this >> because fewer people are using Lucene? Or is it due to the reluctance to >> work with the developer community? Unfortunately the perception in the eyes >> of some people who work on search related projects it is the latter. >> > > > Or, could it be that Hadoop is relatively new and in vogue at the moment, > very malleable and buggy(?) and has a HUGE corporate sponsor who dedicates > lots of resources to it on a full time basis, whilst Lucene has been around > in the ASF for 7+ years (and 12+ years total) and has a really large install > base and thus must move more deliberately and basically has 1 person who > gets to work on it full time while the rest of us pretty much volunteer? > That's not an excuse, it's just the way it is. I personally, would love to > work on Lucene all day every day as I have a lot of things I'd love to > engage the community on, but the fact is I'm not paid to do that, so I give > what I can when I can. I know most of the other committers are that way > too. > > Thus, I don't think any one of us has a reluctance to move ahead with > features or bug fixes. Looking at CHANGES.txt, I see a lot of > contributors. Looking at java-dev and JIRA, I see lots of engagement with > the community. Is it near the historical high for traffic, no it's not, but > that isn't necessarily a bad thing. I think it's a sign that Lucene is > pretty stable. > > What we do have a reluctance for are patches that don't have tests (i.e. > this one), patches that massively change Lucene APIs in non-trivial ways or > break back compatibility or are not kept up to date. Are we perfect? Of > course not. I, personally, would love for there to be a way that helps us > process a larger volume of patches (note, I didn't say commit a larger > volume). Hadoop's automated patch tester would be a huge start in that, but > at the end of the day, Lucene still works the way all ASF projects do: via > meritocracy and volunteerism. You want stuff committed, keep it up to > date, make it manageable to review, document it, respond to > questions/concerns with answers as best you can. To that end, a real simple > question can go a long way and getting something committed, and it simply > is: "Hey Lucener's, what else can I do to help you review and commit > LUCENE-XXXX?" Lather, rinse, repeat. Next thing you know, you'll be on > the receiving end as a committer. > > -Grant > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >