On Wednesday 12 March 2003 01:19, Magnus Johansson wrote:Yes but the word fotbollsmatch was never indexed in this example. Only the word fotboll.
Well, the problem arise when a user enters a query with a compound word
and the compound word itself is not indexed, only one of its parts.
Yes, but neither is compound word itself ever user in query either, assuming same analyser is used (like it always should)?
For example the index contains a document with the following word: fotboll (football).
Let's say the users searches for fotbollsmatch (football game). The word
is split into fotboll and match and the phrase "fotboll match" is
searched for.
The user finds no matching document.
But same happens during indexing; fotbollsmatch should be properly
split and stemmed to "fotboll" and "match" terms, right?
I want a query for fotbollsmatch to match a document containing the word fotboll.
I think I'll accept how it works now. It is perhaps unlikely that the user would query the index
Comparing this to english the user would have found a document, however scored slightly lower than a document containing both the words football and game.
I agree with you that this might not be a problem. The user could be
instructed
to reformulate his query. However the behaviour for an english index and
I actually think that if user has to be aware of internal stemming and reformulate query I think this would be bit of a problem. :-)
But I'm not 100% sure search string would differ from indexed string, assuming same base token (unprocessed token, ie "fotbollsmatch") was both contained
in the document and searched for using QueryParser.
a swedish
index would be different.
I think that in general behaviour is heavily dependant on analyser (tokenizer + stemmer) being used, so it's probably different between most languages.
using a compound word and expecting documents containing only one of its parts in result.
The more I think about it the more difficult it becomes to come up with a realistic example
of why the behaviour would need to be changed.
Thank you for your feedback
/magnus
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
