On Tue, Mar 22, 2011 at 04:15:53PM +0100, fr.jur...@voila.fr wrote: > The only thing I need is the middle layer: a Java component extending > Lucene, that'd pull a plausible Analyzer out of its magic hat, for every > ISO 639-1 language tag however unlikely that turns up in the RDF input. > Not just an analyzer, mark: a *plausible* one. I mean one that'll > generate usable indexes right out of the box in most cases; I cannot > afford to bring the system back into dev & study the arcanes of > automatic indexing in just every language we're working with. So > lucene-contrib + covering the rest with StandardAnalyzer/English > stopwords is not an option.
Hi, maybe you could have a look at java.text.* and specifically BreakIterator (Thai analyzer use it) it could be better than STDAnalyzer for a fallback. Don't forget that if you use multiple analyzers at index time you'll have to use multiple analyzers at query time (tricky part of the process). Regards. -- David Causse Spotter http://www.spotter.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org