On Mon, Mar 11, 2013 at 4:31 PM, Nick Wellnhofer <[email protected]> wrote:
> We could simply include the type of the regex engine and even the particular > version in the serialization and equality test of RegexTokenizer. This > should safely guard against using different engines for indexing. > > Whether regex engines are selected at compile time, by choosing a dedicated > class, or by a constructor argument shouldn't make a difference then. I'd > prefer the latter approach. Having the equality test fail leads to a harsh consequence, though -- when the regex engine changes (e.g. because you upgraded the host language) all your apps start throwing exceptions as soon as they try to open the index. And it's quite possible that the changes in the regex behavior don't even affect your app or cause only minor degradation. This problem of degraded recall is the same one we face with all Analyzer behavior changes. The best solution for many users is to live with a window of inferior search results while refreshing the index after an upgrade. If we introduce extra fields specifying the engine and possibly the version, how about leaving them undefined by default, falling back to whatever is available? It's a little strange to have an opt-in which ties your hands on upgrade, though... Fortunately, with StandardTokenizer and EasyAnalyzer moving to the forefront in all our sample code, we can assume that fewer people use RegexTokenizer these days and all this matters less than it once did. :) Marvin Humphrey
