Re: Lucene's default settings & back compatibility

Michael McCandless Tue, 19 May 2009 05:19:55 -0700

On Tue, May 19, 2009 at 7:26 AM, Grant Ingersoll <[email protected]> wrote:


> I don't think we have said that bug fixes are required to be back
> compatible, even if it is in analysis.  I think it is a really bad idea for
> TokenStreams to have if clauses in them checking boolean values for old
> versus new behaviors.
>
> When they can be back compat, we do, but there is not a requirement.  For
> instance, we upgraded Snowball.

True (Snowball), but then we have discussions like this:

  
https://issues.apache.org/jira/browse/LUCENE-1068?focusedCommentId=12550948&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12550948

which added a confusing deprecated "boolean replaceDepAcronym =
false;" to StandardAnalyzer.  Something similar led to
StandardAnalyzer.replaceInvalidAcronym.

I think there have been other cases (in particular StandardAnalyzer,
QueryParser) over time, but I haven't tracked them down.  Analyzer
back compat after fixing issues is especially tricky since the bugs
get "cached" into the index and queries against that index using the
fixed analyzer may not longer match the docs.  (So I think back-compat
is important in Analyzers).

> Or, the removal of StopFilter as "Standard" all together.  This coupled with
> a QP that created phrases around stop words is a better solution.

Interesting... that'd be a pretty big change to StandardAnalyzer,
though.

I can see we are spinning off lots of neat ideas, decoupled from the
"Settings" proposal, here :)

> For instance, if we removed the StopFilter from the StandardAnalyzer, then
> what?  A Settings object would not be able to account for it.

Why not?  The settings object could have say a property
"analysis.standard.enableStopFilter"?

> Likewise, the subtler issue of "fixing" a TokenStream such that it
> might produce different tokens.

Settings should cover this in general, I think.

> I really worry about Settings objects having to be repeatedly checked inside
> of tight inner loops.  Even looking at the new TokenStream stuff, there are
> now checks for the "new API" in an area that is called _a lot_ of times.

Agreed, but I'd say this is orthogonal.  We should never do slow
things inside inner loops -- checking settings, calling logging
frameworks, calling List.size(), opening files, etc.  This is the
stuff of standard coding practices...

> Last, and mostly I mention it as an afterthought.  How are you going to
> handle changes to the Settings?  Say, for instance, we come out w/
> Settings2.4, release it and then we realize we missed something (and this
> seems likely given the number of settings available in Lucene), then
> what?
>
> We deprecate Settings2.4 and come out with TheRealSettingsFor2.4?  And then
> when that is incomplete?

Well, in 2.9 there would still be a Settings2.4 class, but it'd have
newly created (in 2.9) settings with their defaults bound.

So in 2.9, when sorting by field you can optionally turn off scoring.
It gives a sizable performance boost doing so.  We of course were
forced to leave scoring on for back compat, but if we had this
Settings class online what we would have done instead is add a new
"search.sort.trackScores" (and, "trackMaxScore") setting to the base
Settings class, but the Settings2.4 would bind it to true.

There should be no need to make a new class for 2.4's settings on
releasing 2.9?

> I still think we would benefit from just communicating upcoming changes
> better even in minor releases, thereby allowing for a bit more variance in
> back compat.  It should be the exception, not the rule.

I like DM's point, that this Settings class would be a great vehicle
for exactly that communication.  Rather than pouring over a
CHANGES.txt, you can see setting-by-setting what changed, and why.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Lucene's default settings & back compatibility

Reply via email to