Well, the problem arise when a user enters a query with a compound word and the compound word itself is not indexed, only one of its parts.
For example the index contains a document with the following word: fotboll (football).
Let's say the users searches for fotbollsmatch (football game). The word
is split into fotboll and match and the phrase "fotboll match" is searched for.
The user finds no matching document.
Comparing this to english the user would have found a document, however scored
slightly lower than a document containing both the words football and game.
I agree with you that this might not be a problem. The user could be instructed
to reformulate his query. However the behaviour for an english index and a swedish
index would be different.
/magnus
Tatu Saloranta wrote:
On Tuesday 11 March 2003 03:05, Magnus Johansson wrote:
Hello
I have written an Analyzer for swedish. Compound words are common in
swedish, therefore my Analyzer tries to split the compound words
into its parts. For example the swedish word fotbollsmatch (football
game) is split into fotboll and match.
(same applies to many other languages so this is a common problem I think).
However... I'm not sure why you consider this a problem? The reason quotes
are added is that since a single token (as parsed by QueryParser) expands no
multiple terms, it becomes a PhraseQuery. Same happen (should happen)
during indexing, so end result should match word in both "normal" case (word is correctly spelled as compound word) and when word is (incorrectly) spelled with spaces?
As to quotes; they are only shown when converting query to a String; internally there are no quotes to be matched.
-+ Tatu +-
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
