Hi eveyone, We've uncovered some perhaps undesirable behavior when doing a wildcard search against a stemmed index. These issues may be part of the problems referenced in the thread "Too few search results".
The problem is that, for prefix and wildcard queries, the query string is not sent to the analyzer for tokenization (and stemming). This can result in expected hits not being returned. For example: "pipette" gets stemmed to "pipet". a search on "pipette*" will not match against the documents with "pipette" in there, "pipette" having been stemmed at index time to "pipet". cylinder gets stemmed to cylind. a search on "*cylinder" will not match against "pipettecylinder", "pipetcylinder" having been stemmed at index time to "pipetcylind". I've coded a fix to this in QueryParser.jj. In the cases like the above, take the word without the '*' and send it to the analyzer. If a single token is returned, use it to create the PrefixQuery or WildcardQuery. So, if you search for "pipette*", send "pipette" to the analyzer, get "pipet" back, create the PrefixQuery using "pipet", not "pipette". If y'all feel this is an issue that needs fixin', let me know and I'll post my fix. Thanks, David Birtwell --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
