On Wednesday 12 March 2003 01:19, Magnus Johansson wrote: > Well, the problem arise when a user enters a query with a compound word > and the compound word itself is not indexed, only one of its parts.
Yes, but neither is compound word itself ever user in query either, assuming same analyser is used (like it always should)? > For example the index contains a document with the following word: > fotboll (football). > > Let's say the users searches for fotbollsmatch (football game). The word > is split into fotboll and match and the phrase "fotboll match" is > searched for. > The user finds no matching document. But same happens during indexing; fotbollsmatch should be properly split and stemmed to "fotboll" and "match" terms, right? > Comparing this to english the user would have found a document, however > scored > slightly lower than a document containing both the words football and game. > > I agree with you that this might not be a problem. The user could be > instructed > to reformulate his query. However the behaviour for an english index and I actually think that if user has to be aware of internal stemming and reformulate query I think this would be bit of a problem. :-) But I'm not 100% sure search string would differ from indexed string, assuming same base token (unprocessed token, ie "fotbollsmatch") was both contained in the document and searched for using QueryParser. > a swedish > index would be different. I think that in general behaviour is heavily dependant on analyser (tokenizer + stemmer) being used, so it's probably different between most languages. -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
