Re: English and French documents together / analysis, indexing, searching

Bernhard Messer Thu, 20 Jan 2005 10:28:38 -0800

Right now I am using StandardAnalyzer but the results are not what I'd hope for. Also since my understanding is that we should use the same analyzer for searching that was used for indexing, even if I can manage to guess the language during indexing and apply to the SnowBall analyzer I wouldn't be able to use SnowBall for searching because users want to search through both English and French and I suppose I would not get the same results if used with StandardAnalyzer?

you could try to create a more complex query and expand it into both languages using different analyzers. Would this solve your problem ?

Another problem with StandardAnalyzer is that it breaks up some words that should not be broken (in our case document identifiers such as ABC-1234 etc) but that's a secondary issue...

This is a behaviour is implemented in StandardTokenizer used by StandardAnalyzer. Look at the documentation of StandardTokenizer:

Many applications have specific tokenizer needs.  If this tokenizer does
not suit your application, please consider copying this source code
directory to your project and maintaining your own grammar-based tokenizer.

Bernhard

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: English and French documents together / analysis, indexing, searching

Reply via email to