Analyzers and multiple languages

Antony Bowesman Fri, 13 Oct 2006 00:43:43 -0700

Hello,

I'm new to Lucene and wanted some advice on analyzers, stemmers and languageanalysis. I've got LIA, so have read it's chapters.

I am writing a framework that needs to be able to index documents from a rangeof languages where just the character set of the document is known. Has anyonelooked at or is using language analysis to determine the language of a documentin ISO-8859-1.

Is it worth doing or does StandardAnalyzer cope well with most Europeanlanguages as long as it is provided with a suitable multi-lingual set of stop words.

What about stemming? I see Google now says it does stemming, but again herelanguage detection seems to be a stumbling block in the way of choosing theright stemmer. Does stemming provide much of an index size reduction and is itactually useful in search?


Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Analyzers and multiple languages

Reply via email to