this would be a little strange since the current SnowballAnalyzer uses
reflection to find the class name of the stemmer... in my opinion
when/if snowball is merged this reflection-based SnowballAnalyzer
should be deprecated.

when/if we merge snowball we may want to rethink this: personally i
would like to do a tighter integration of snowball with
contrib/analyzers

in my opinion what is more user-friendly is to have packages for each
language, with the analyzer using the 'best' defaults for the
language.
in the case of swedish, this could be the new stemmer as it would not
affect back compat, and the old snowball one would still be available.

i still think all the options should be there, but i think it would be
really nice to have an out-of-box Analyzer/template for each language
with good default settings, and for some this might mean using a
snowball stemmer, or it might not... for german we would try to figure
out which of the 3 is the best default (analyzers/de/GermanStemmer,
snowball/GermanStemmer, snowball/German2Stemmer)

I also think even for the snowball supported languages there is more
involved than just stemming, see what it does now for turkish as an
example, it needs to use a special lowercasing. in the case of german
(and maybe swedish too!) in the future maybe we want to have
decompounding added too, depending on version. I think this kind of
setup would be better than a monolithic SnowballAnalyzer for the
future, and easier on the users.

On Fri, Jan 1, 2010 at 4:51 PM, Uwe Schindler <u...@thetaphi.de> wrote:
>> I guess since apache con we all agree on not using any of those
>> ambiguous terms for class naming anymore! Yet, before we think about a
>> name we should rather check if we can make this new functionality
>> optional in the already existing code. Each time I see German2Stemmer
>> it reminds me of this code duplication inside snowball / analyzers
>> which needs cleanup. LUCENE-1515 has been around for some time so we
>> should not rush with a commit until we have found a good solution
>> hopefully without having a Swedish2Stemmer.java.
>
> I thought we wanted to add the new stemmer and use it with matchVersion >=
> LUCENE_31? That would be the best for all users, existing indexes would work
> and new users can use the new stemmer.
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>



-- 
Robert Muir
rcm...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to