Ah, I got it. THX. In the good old days, the wildcards were used as a fix for missing stemming module. I am not sure if you can combine these two opposite approaches successfully. I see the following drawbacks of your solution.

Example:
built* (->built) could be changed to build* (no built, but ->builder, building, etc.), and precision will go down drastically.


You probably use a stemmer with one important bug (a.k.a. feature) - overstemming, so here is another example:
political* (->political, politically) is transformed to polic* (->policer, policy, policies, policement etc.) by Porter alg., and the precision is again affected drastically


-g-

[EMAIL PROTECTED] wrote:

Your analyzers can optionally incorporate stemming, along with the other
things that analyzers do (lowercasing, etc...).  The stemming algorithms
are all different.  This "searcher" example was made up, but, there are
instances where stemming at index time and not stemming wildcard searches
will result in lost hits.  Specifically, we encountered this situation
using the optional Snoball analyzers (which work great, by the way).

DaveB




Leo Galambos <[EMAIL PROTECTED]> To: Lucene Users List <[EMAIL PROTECTED]> 05/30/03 10:26 AM cc: Please respond to Subject: Re: Lowercasing wildcards - why? "Lucene Users List"





I'm sorry, I did not read the complete thread. Do you mean - analyzer == stemmer? Does it really work? If I was a stemmer, I would let "searche" intact. ;-)

-g-

[EMAIL PROTECTED] wrote:



Hi Les,

We ended up modifying the QueryParser to pass prefix and suffix queries
through the Analyzer. For us, it was about stemming. If you decide to


use


an analyzer that incorporated stemming, there are cases where wildcard
queries will not return the expected results.

Example: "searcher" will probably get stemmed to "search". A search on
"searche*" should hit the term "searcher", but, it won't, all instances of
"searcher" having been stemmed to "search" at index time. Our solution


was


to remove the trailing wildcard and send "searche" to the analyzer, then
tack the wildcard character back on there and create the PrefixQuery


object


with the new search string "search*".

DaveB










Leslie Hughes





<[EMAIL PROTECTED] To:


"'[EMAIL PROTECTED]'"


ion.com.au>


<[EMAIL PROTECTED]>


cc:





05/30/03 01:09 AM Subject:


Lowercasing wildcards - why?


Please respond to "Lucene





Users List"













Hi,

I was just wondering what the rationale is behind lowercasing wildcard
queries produced by QueryParser? It's just that my data is all upper case
and my analyser doesn't lowercase so it seems a bit odd that I have to


call


setLowercaseWildcardTerms(false). Couldn't queryparser leave the terms
unnormalised or better still pass them through the analyser?

I'm sure there's a good reason for it though.....


Les




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]










---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]








---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to