On Mon, Mar 11, 2013 at 4:31 PM, Nick Wellnhofer <[email protected]> wrote:

> We could simply include the type of the regex engine and even the particular
> version in the serialization and equality test of RegexTokenizer. This
> should safely guard against using different engines for indexing.
>
> Whether regex engines are selected at compile time, by choosing a dedicated
> class, or by a constructor argument shouldn't make a difference then. I'd
> prefer the latter approach.

Having the equality test fail leads to a harsh consequence, though -- when the
regex engine changes (e.g. because you upgraded the host language) all your
apps start throwing exceptions as soon as they try to open the index.  And
it's quite possible that the changes in the regex behavior don't even affect
your app or cause only minor degradation.

This problem of degraded recall is the same one we face with all Analyzer
behavior changes.  The best solution for many users is to live with a window
of inferior search results while refreshing the index after an upgrade.

If we introduce extra fields specifying the engine and possibly the version,
how about leaving them undefined by default, falling back to whatever is
available?  It's a little strange to have an opt-in which ties your hands on
upgrade, though...

Fortunately, with StandardTokenizer and EasyAnalyzer moving to the forefront
in all our sample code, we can assume that fewer people use RegexTokenizer
these days and all this matters less than it once did. :)

Marvin Humphrey

Reply via email to