Re: Lowercasing wildcards - why?

Leo Galambos Sat, 31 May 2003 02:01:54 -0700

Ah, I got it. THX. In the good old days, the wildcards were used as a fix for missing stemming module. I am not sure if you can combine these two opposite approaches successfully. I see the following drawbacks of your solution.

Example: built* (->built) could be changed to build* (no built, but ->builder, building, etc.), and precision will go down drastically.

You probably use a stemmer with one important bug (a.k.a. feature) - overstemming, so here is another example: political* (->political, politically) is transformed to polic* (->policer, policy, policies, policement etc.) by Porter alg., and the precision is again affected drastically

-g-

[EMAIL PROTECTED] wrote:

Your analyzers can optionally incorporate stemming, along with the other
things that analyzers do (lowercasing, etc...).  The stemming algorithms
are all different.  This "searcher" example was made up, but, there are
instances where stemming at index time and not stemming wildcard searches
will result in lost hits.  Specifically, we encountered this situation
using the optional Snoball analyzers (which work great, by the way).
DaveB

Leo Galambos <[EMAIL PROTECTED]> To: Lucene Users List <[EMAIL PROTECTED]> 05/30/03 10:26 AM cc: Please respond to Subject: Re: Lowercasing wildcards - why? "Lucene Users List"
I'm sorry, I did not read the complete thread. Do you mean - analyzer ==
stemmer? Does it really work? If I was a stemmer, I would let "searche"
intact. ;-)
-g-

[EMAIL PROTECTED] wrote:

Hi Les,

We ended up modifying the QueryParser to pass prefix and suffix queries through the Analyzer. For us, it was about stemming. If you decide to

use
an analyzer that incorporated stemming, there are cases where wildcard
queries will not return the expected results.
Example: "searcher" will probably get stemmed to "search". A search on "searche*" should hit the term "searcher", but, it won't, all instances of "searcher" having been stemmed to "search" at index time. Our solution
was

to remove the trailing wildcard and send "searche" to the analyzer, then tack the wildcard character back on there and create the PrefixQuery

object

with the new search string "search*".

DaveB

Leslie Hughes

<[EMAIL PROTECTED] To:

"'[EMAIL PROTECTED]'"

ion.com.au>

<[EMAIL PROTECTED]>

cc:

05/30/03 01:09 AM Subject:

Lowercasing wildcards - why?

Please respond to "Lucene

Users List"

Hi,

I was just wondering what the rationale is behind lowercasing wildcard queries produced by QueryParser? It's just that my data is all upper case and my analyser doesn't lowercase so it seems a bit odd that I have to

call
setLowercaseWildcardTerms(false). Couldn't queryparser leave the terms
unnormalised or better still pass them through the analyser?
I'm sure there's a good reason for it though.....

Les
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lowercasing wildcards - why?

Reply via email to