Example:
built* (->built) could be changed to build* (no built, but ->builder, building, etc.), and precision will go down drastically.
You probably use a stemmer with one important bug (a.k.a. feature) - overstemming, so here is another example:
political* (->political, politically) is transformed to polic* (->policer, policy, policies, policement etc.) by Porter alg., and the precision is again affected drastically
-g-
[EMAIL PROTECTED] wrote:
Your analyzers can optionally incorporate stemming, along with the other things that analyzers do (lowercasing, etc...). The stemming algorithms are all different. This "searcher" example was made up, but, there are instances where stemming at index time and not stemming wildcard searches will result in lost hits. Specifically, we encountered this situation using the optional Snoball analyzers (which work great, by the way).
DaveB
Leo Galambos <[EMAIL PROTECTED]> To: Lucene Users List <[EMAIL PROTECTED]> 05/30/03 10:26 AM cc: Please respond to Subject: Re: Lowercasing wildcards - why? "Lucene Users List"
I'm sorry, I did not read the complete thread. Do you mean - analyzer == stemmer? Does it really work? If I was a stemmer, I would let "searche" intact. ;-)
-g-
[EMAIL PROTECTED] wrote:
Hi Les,use
We ended up modifying the QueryParser to pass prefix and suffix queries
through the Analyzer. For us, it was about stemming. If you decide to
wasan analyzer that incorporated stemming, there are cases where wildcard queries will not return the expected results.
Example: "searcher" will probably get stemmed to "search". A search on
"searche*" should hit the term "searcher", but, it won't, all instances of
"searcher" having been stemmed to "search" at index time. Our solution
to remove the trailing wildcard and send "searche" to the analyzer, thenobject
tack the wildcard character back on there and create the PrefixQuery
with the new search string "search*".
DaveB
Leslie Hughes
<[EMAIL PROTECTED] To:"'[EMAIL PROTECTED]'"
ion.com.au><[EMAIL PROTECTED]>
cc:
05/30/03 01:09 AM Subject:Lowercasing wildcards - why?
Please respond to "Lucene
Users List"
call
Hi,
I was just wondering what the rationale is behind lowercasing wildcard
queries produced by QueryParser? It's just that my data is all upper case
and my analyser doesn't lowercase so it seems a bit odd that I have to
setLowercaseWildcardTerms(false). Couldn't queryparser leave the terms unnormalised or better still pass them through the analyser?
I'm sure there's a good reason for it though.....
Les
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
