Re: Analyzer enquiry

Erick Erickson Sun, 13 Mar 2011 19:06:51 -0700

StandardAnalyzer works well for most European languages. The problem will
be stemming. Applying stemming via English rules to non-English languages
produces...er...interesting results.


You can go ahead and create language-specific fields for each language and
use StandardAnalyzer with the appropriate stopwords and stemming with each,
this is a common approach.. The Snowball stemmer takes a language parameter...

You need to use specific analyzers for Chinese Japanese Korean (CJK) documents
though.

Hope that helps
Erick

On Sun, Mar 13, 2011 at 7:23 PM, Vasiliki Gkouta <vgko...@csd.auth.gr> wrote:
> Hello everybody,
>
> I have an enquiry about StandardAnalyzer. Can I use it for other languages
> except from English? I give the right list of stop words at initialization.
> Is there anything else inside the class that is by default set in English?
> I've found the Analyzers for other languages too but they where seem to be
> deprecated.. Moreover I use english and other languages, all together in my
> project so I would like to ask if there is a way to use either the same
> class analyzer for all of them, or analyzers of the same functionality for
> all the languages. Thanks in advance!
>
> Best regards,
> Vicky
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Analyzer enquiry

Reply via email to