Re: LUCENE-1515

DM Smith Sat, 02 Jan 2010 05:17:22 -0800

On Jan 2, 2010, at 7:46 AM, Robert Muir wrote:

>> I also want backward compatibility. Or at least control over it. That is, I 
>> need for indexes to work fully but want an easy path to upgrade/replace an 
>> index with better analyzer/filter combos. This stemmer is not backward 
>> compatible.
> 
> But the Analyzers can be (we can have the old stemmer available also),
> and if we create an analyzers/sv or whatever that uses the
> SmartSwedishStemmer or whatever its named, its not a back compat break
> as long as you can still use SnowballAnalyzer("Swedish") and get the
> old one, right?


Right. New code is not a backward compatibility break. Replacing the 
SnowballAnalyzer("Swedish") with this one would be.

The problem I have with names like "SmartSwedishStemmer" is when a better 
solution comes about. What then "SmarterSwedishStemmer", 
"BrilliantSwedishStemmer", "BetterSwedishStemmer"? Likewise for any other 
descriptive name.

I guess I'd want 
OutOfTheBoxBestLatestGreatestAndMostHighlyRecommendedSwedishStemmer to stay the 
out of the box best , latest, greatest and most highly recommended Swedish 
Stemmer by Version;)


> 
> For I think a first example of improving analyzers with Version, check
> out the modifications to CzechAnalyzer, with that one we added a
> stemmer where there was not one before, but the stemming only takes
> place with Version >= 3.1 by default. In my opinion we should exploit
> Version to improve analyzers, based upon relevance testing or
> published relevance results if at all possible, of course.

This is good and should be pursued. Under this only one name is needed.

However, this solves only half of the problem.

The index does not have metadata regarding what it requires to maintain 
backward compatibility. To have full backward compatibility, one may/must know 
that a particular index was built with:
A particular JRE or Unicode version.
With a particular (i.e. version) tokenizer.
With a particular stop word list.
With a particular ordered chain of filters.
...

And then searches need to use that same software blend.

There is incipient support to store this in the index but it is the 
responsibility of users to determine what metadata is needed for their indexes 
and for them to create a custom representation of it.

I think Marvin mentioned a methodology in Lucy to capture that info and to use 
it to build the analyzer.

-- DM



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: LUCENE-1515

Reply via email to