Re: Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.

David Causse Tue, 22 Mar 2011 09:40:18 -0700

On Tue, Mar 22, 2011 at 04:15:53PM +0100, [email protected] wrote:
> The only thing I need is the middle layer: a Java component extending 
> Lucene, that'd pull a plausible Analyzer out of its magic hat, for every 
> ISO 639-1 language tag however unlikely that turns up in the RDF input.
> Not just an analyzer, mark: a *plausible* one. I mean one that'll 
> generate usable indexes right out of the box in most cases; I cannot 
> afford to bring the system back into dev & study the arcanes of 
> automatic indexing in just every language we're working with. So 
> lucene-contrib + covering the rest with StandardAnalyzer/English 
> stopwords is not an option.


Hi,

maybe you could have a look at java.text.* and specifically BreakIterator
(Thai analyzer use it) it could be better than STDAnalyzer for a fallback.
Don't forget that if you use multiple analyzers at index time you'll
have to use multiple analyzers at query time (tricky part of the
process).

Regards.

-- 
David Causse
Spotter
http://www.spotter.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.

Reply via email to