On Thursday 13 March 2003 00:52, Magnus Johansson wrote: > Tatu Saloranta wrote: ... > >But same happens during indexing; fotbollsmatch should be properly > >split and stemmed to "fotboll" and "match" terms, right? > > Yes but the word fotbollsmatch was never indexed in this example. Only > the word fotboll. > I want a query for fotbollsmatch to match a document containing the word > fotboll.
Ok I think I finally understand what you meant. :-) So, basically, in your case you would prefer getting query: fotbollsmatch to expand to (after stemming etc): fotboll match and not "fotboll match" So that matching just one of the words would be enough for a hit (either "either of" or "just first word" or "just last word"). It would be possible to implement this functionality by overriding default QueryParser and modifying its functionality slightly. In QueryParser you should be able to override default handling for terms, so that whenever you get just single token (in this case "fotbollsmatch") that expands to multiple Terms, you do not construct a phrase query, but just BooleanQuery with TermQueries (look at getFieldQuery(); it handles basic search terms). You may need to use simple heuristics for figuring when you have white space(s) that indicate "normal" phrases, which probably should still be handled using PhraseQuery. Of course this is all assuming you still do want that functionality. :-) And if you do, it would be good idea to get patch back in case someone else finds that useful later on (I think many non-english languages have concept of compound words; German and Finnish at least do). -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
