Re: QueryParser and compound words

Magnus Johansson Wed, 12 Mar 2003 00:18:05 -0800

Well, the problem arise when a user enters a query with a compound word
and the compound word itself is not indexed, only one of its parts.

For example the index contains a document with the following word:
fotboll (football).

Let's say the users searches for fotbollsmatch (football game). The word is split into fotboll and match and the phrase "fotboll match" is searched for. The user finds no matching document.

Comparing this to english the user would have found a document, however scored slightly lower than a document containing both the words football and game.

I agree with you that this might not be a problem. The user could be instructed to reformulate his query. However the behaviour for an english index and a swedish index would be different.

/magnus

Tatu Saloranta wrote:

On Tuesday 11 March 2003 03:05, Magnus Johansson wrote:

Hello

I have written an Analyzer for swedish. Compound words are common in swedish, therefore my Analyzer tries to split the compound words into its parts. For example the swedish word fotbollsmatch (football game) is split into fotboll and match.

(same applies to many other languages so this is a common problem I think).

However... I'm not sure why you consider this a problem? The reason quotes are added is that since a single token (as parsed by QueryParser) expands no multiple terms, it becomes a PhraseQuery. Same happen (should happen) during indexing, so end result should match word in both "normal" case (word is correctly spelled as compound word) and when word is (incorrectly) spelled with spaces? As to quotes; they are only shown when converting query to a String; internally there are no quotes to be matched.

-+ Tatu +-
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: QueryParser and compound words

Reply via email to