Re: Lucene's default settings & back compatibility

Michael McCandless Fri, 22 May 2009 10:22:55 -0700

On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey
<mar...@rectangular.com> wrote:
>
>> when working on 3.1 if we make some great improvement, I'd like new users in
>> 3.1 to see the improvement by default.
>
> Sounds like an argument for more frequent major releases.


Yeah.  Or "rebranding" what we now call minor as major releases, by
changing our policy ;) Or "rebranding" to Lucene 2009.

But: localized improvements (like the sizable performance gain from
turning off scoring when sorting by field) should not have to wait for
a major release to benefit new users.  I think they should be on by
default on the next release.

Will Lucy do scoring when sorting by field, by default?

>> On thinking about it more... automagically storing the "actsAsVersion"
>> in the index, and then having IndexWriter (for example) ask the
>> analyzer for a tokenStream matching that version, seems a little too
>> sneaky.
>
> Can you elaborate?
>
> In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter
> have to be passed a Schema, which contains all the Analyzers.  Analyzers
> aren't satellite classes under this model -- they are a fixed property of a
> FullTextType field spec.  Think of them as baked into an SQL field definition.
>
> You can create a Schema from scratch to pass to the QueryParser, but it's
> easier to just get it from the Searcher.  Translating to Java...
>
>   Searcher searcher = new Searcher("/path/to/index");
>   QueryParser qparser = new QueryParser(searcher.getSchema());
>
> I don't see how that's so different from getting an analyzer actsAsVersion
> number from the index.

I agree in KS/Lucy, it works well, because you must explicitly pass in
Schema to each of the satellite classes.

But in Lucene, if whenever IndexWriter asked analyzer for a
tokenstream, it passed in the actsAsVersion it had loaded from the
index, that's sneaky.  I'd rather have it explicit (like KS/Lucy), so
you'd have to IndexWrter.getActsAsVersion, then pass that into your
analyzer when you create it.  It's the automatic under-the-hood
passing that makes me nervous and I think would confuse users.

(That said, unrelated to this discussion, I would actually like to
record per-segment which version of Lucene wrote the segment; this
would be very helpful when debugging issues like LUCENE-1474 where I
need to know if the segments were written by 2.4.0 or 2.4.1).

> Now, where stuff might start to get complicated is PerFieldAnalyzerWrapper...
> is that where the sneakiness gets overwhelming?

Per-class actsAsVersion would work well here -- PFAW would just
forward the required version when requesting the tokenStream.

>> I prefer the up-front "you specify actsAsVersion" when you
>> create the analyzer, only for analyzers that have changed across
>> releases.  So things like WhitespaceAnalyzer would likely never need
>> an actsAsVersion arg.
>
> Hmm, this is kind of hard.  I'd prefer that the argument remain optional, so
> that new users don't have to think about it.

I wouldn't mind optional, but only if it defaults to latest and
greatest.  The goal here is to have new users always see the best of
Lucene when they start out.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene's default settings & back compatibility

Reply via email to