On Thursday 06 February 2003 11:12, D.L.B. wrote: > Hi eveyone, > > We've uncovered some perhaps undesirable behavior when doing a wildcard > search against a stemmed index. These issues may be part of the problems > referenced in the thread "Too few search results". > > The problem is that, for prefix and wildcard queries, the query string is > not sent to the analyzer for tokenization (and stemming). This can result > in expected hits not being returned. For example: > ... > I've coded a fix to this in QueryParser.jj. In the cases like the above, > take the word without the '*' and send it to the analyzer. If a single > token is returned, use it to create the PrefixQuery or WildcardQuery. So, > if you search for "pipette*", send "pipette" to the analyzer, get "pipet" > back, create the PrefixQuery using "pipet", not "pipette". > > If y'all feel this is an issue that needs fixin', let me know and I'll post > my fix.
Perhaps a good fix would be to improve QueryParser to accept another analyzer (in addition to default one), analyzer that will be used for tokenizing wild card / prefix terms? Often simple default analyzer (for example one that just lower cases input) should do nicely. This could be done by adding a method for setting such analyzer; default would be to not use any analyzer (to keep backwards compatibility)? I think the separation between "high-level" query parsing (ie. handling modifiers, +/-,?, field prefix, AND, OR) and "low-level" is a really good thing to have, and it'd be good to if that could work similarly with prefix queries too. Just my 2c., -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
