Otis Gospodnetic wrote:

Dmitry and others,

One of the relatively frequently asked for features is 'conceptual
search', or 'search by similarity', etc.  Lucene does not store term
vectors in its index, so such searches cannot be supported.

However, almost two years ago, Dmitry provided a large set of patches
that added term vector support to Lucene.  We never applied those
patches for some reason, even though the patches looked really good.
The other day I looked at Dmitry's two year old email again.
I applied a few diffs to my copy of Lucene and added new classes that
Dmitry wrote in order to add term vector support, to the source tree.
Unfortunately, lots of classes changed over the last two years, and not
all patches will apply.

I was wondering, Dmitry, if you have your term vector changes
integrated with the current version of Lucene.  If you do, would it be
possible for you to send the patches again?

Well, it's actually not that simple. The code of Lucene that we use is pretty heavily modified (by the term vector patch and by a few later additions, such as the TermEnum patch from 6 months ago or so). What I'd like to do with the file handles is to make changes in the current Lucene sources, do the testing and all, and then port the changes into our version of Lucene. This way the contribution will be readily usable. The term vector patches that I sent before, are out there, so feel free to incorporate them into Lucene, but I can't really spend time on them right now. Plus, I think that from IP point of view, those changes allow the company I work for to do things with Lucene that our competitors can't readily do, and these things happen to be very much key to our value proposition, so I really can't publish any more of those changes yet. Now, if Lucene acquired a similar capability from what I already published or from some other source, perhaps we could contribute to that effort later in smaller ways.

A great thing about the Apache license is that it allows this kind of flexibility (IANAL). This is just where I'm comfortable drawing the line right now. Sorry if this comes across as ungrateful... We are really very appreciative of the Lucene project and of the community, and we'll try to contribute in other ways, but this one is not available any more/yet. :)

Also, I noticed that a large portion of those patches contained a good
amount of documentation (code comments, Javadocs).  Dmitry obviously
studied the code in depth :)  I will try extracting at least the
documentation from that contribution.

Yes, I did read it end to end - boy, was that a learning experience! :)


Finally, Dmitry, if you have term vector support in your local copy of the current Lucene sources, how are you going to make patches containing only the changes that you outlined in the recent email? Are term vector changes gone or....?

Like I said above, I'll be working with the current Lucene from CVS up until the changes are final, then I will port them to my copy of the Lucene.
Perhaps later we can get back to the TermEnum changes as well. Those I could contribute (well, actually I already did :) ). The jist there is that I was able to reduce garbage collection on certain operations substantially, but I think someone reported that the code did not work correctly in some cases (must be uses of Lucene that we do not experience in our environment).


Thanks for digging the term vectors back out, Otis.
Dmitry.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to