True enough. We're supporting search of a product database, so, for us, it
made sense to increase coverage and accept the loss of precision. Our
solution is definitely not globally applicable.
DaveB
Leo Galambos
<[EMAIL PROTECTED]> To: Lucene Users List
<[EMAIL PROTECTED]>
05/30/03 11:55 AM cc:
Please respond to Subject: Re: Lowercasing wildcards -
why?
"Lucene Users
List"
Ah, I got it. THX. In the good old days, the wildcards were used as a
fix for missing stemming module. I am not sure if you can combine these
two opposite approaches successfully. I see the following drawbacks of
your solution.
Example:
built* (->built) could be changed to build* (no built, but ->builder,
building, etc.), and precision will go down drastically.
You probably use a stemmer with one important bug (a.k.a. feature) -
overstemming, so here is another example:
political* (->political, politically) is transformed to polic*
(->policer, policy, policies, policement etc.) by Porter alg., and the
precision is again affected drastically
-g-
[EMAIL PROTECTED] wrote:
>Your analyzers can optionally incorporate stemming, along with the other
>things that analyzers do (lowercasing, etc...). The stemming algorithms
>are all different. This "searcher" example was made up, but, there are
>instances where stemming at index time and not stemming wildcard searches
>will result in lost hits. Specifically, we encountered this situation
>using the optional Snoball analyzers (which work great, by the way).
>
>DaveB
>
>
>
>
>
> Leo Galambos
> <[EMAIL PROTECTED]> To: Lucene Users List
>
<[EMAIL PROTECTED]>
> 05/30/03 10:26 AM cc:
> Please respond to Subject: Re: Lowercasing
wildcards - why?
> "Lucene Users
> List"
>
>
>
>
>
>
>I'm sorry, I did not read the complete thread. Do you mean - analyzer ==
>stemmer? Does it really work? If I was a stemmer, I would let "searche"
>intact. ;-)
>
>-g-
>
>[EMAIL PROTECTED] wrote:
>
>
>
>>Hi Les,
>>
>>We ended up modifying the QueryParser to pass prefix and suffix queries
>>through the Analyzer. For us, it was about stemming. If you decide to
>>
>>
>use
>
>
>>an analyzer that incorporated stemming, there are cases where wildcard
>>queries will not return the expected results.
>>
>>Example: "searcher" will probably get stemmed to "search". A search on
>>"searche*" should hit the term "searcher", but, it won't, all instances
of
>>"searcher" having been stemmed to "search" at index time. Our solution
>>
>>
>was
>
>
>>to remove the trailing wildcard and send "searche" to the analyzer, then
>>tack the wildcard character back on there and create the PrefixQuery
>>
>>
>object
>
>
>>with the new search string "search*".
>>
>>DaveB
>>
>>
>>
>>
>>
>>
>>
>
>
>
>> Leslie Hughes
>>
>>
>
>
>
>> <[EMAIL PROTECTED] To:
>>
>>
>"'[EMAIL PROTECTED]'"
>
>
>> ion.com.au>
>>
>>
><[EMAIL PROTECTED]>
>
>
>> cc:
>>
>>
>
>
>
>> 05/30/03 01:09 AM Subject:
>>
>>
>Lowercasing wildcards - why?
>
>
>> Please respond to "Lucene
>>
>>
>
>
>
>> Users List"
>>
>>
>
>
>
>
>
>
>
>
>
>>
>>
>>Hi,
>>
>>I was just wondering what the rationale is behind lowercasing wildcard
>>queries produced by QueryParser? It's just that my data is all upper case
>>and my analyser doesn't lowercase so it seems a bit odd that I have to
>>
>>
>call
>
>
>>setLowercaseWildcardTerms(false). Couldn't queryparser leave the terms
>>unnormalised or better still pass them through the analyser?
>>
>>I'm sure there's a good reason for it though.....
>>
>>
>>Les
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>>
>>
>>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]