On Thu, May 21, 2009 at 05:19:43PM -0400, Michael McCandless wrote:

> Marvin, which solution would you prefer?

Between the two, I'd prefer settings constructor arguments, though I would be
inclined to have settings classes that are specific to individual classes
rather than Lucene-wide.  

At least that scheme gets locality right.  The global actsAsVersion variable
violates that principle and has the potential to saddle a small number of
users who have done absolutely nothing wrong with bugs that are very, very
hard to hunt down.  That's unfair.

As far as analyzers and token streams, the theoretical answer is making
indexes self-describing via serializable schemas, as discussed on the Lucy dev
list, and as implemented in KinoSearch svn trunk.  With versioning metadata
attached to the index, there is no longer any worry about upgrading analysis
modules provided that those modules handle their own versioning correctly.

For instance, in KS the Stopalizer always embeds the complete stoplist in the
schema file, so even if we update the "English" stoplist, we don't get invalid
search results for indexes which were created with the old stoplist.
Similarly, it may not be possible to keep around multiple variants of
Snowball, but at least we can fail catastrophically instead of subtly if we
detect that the Snowball version has changed.

Full-on schema serialization isn't feasible for Lucene, but attaching an
actsAsVersion variable to an index and feeding that to your analyzers would be
a decent start.

Lastly, I think a major java Lucene release is justified already.  Won't this
discussion die down somewhat if you can get 3.0 out?  If there are issues that
are half done, how about rolling back whatever's in the way?

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to