Re: Lowercasing wildcards - why?

David_Birthwell Sat, 31 May 2003 02:09:29 -0700

True enough.  We're supporting search of a product database, so, for us, it
made sense to increase coverage and accept the loss of precision.  Our
solution is definitely not globally applicable.


DaveB




                                                                                       
                    
                      Leo Galambos                                                     
                    
                      <[EMAIL PROTECTED]>        To:       Lucene Users List           
                      
                                                <[EMAIL PROTECTED]>                    
       
                      05/30/03 11:55 AM        cc:                                     
                    
                      Please respond to        Subject:  Re: Lowercasing wildcards - 
why?                  
                      "Lucene Users                                                    
                    
                      List"                                                            
                    
                                                                                       
                    
                                                                                       
                    




Ah, I got it. THX. In the good old days, the wildcards were used as a
fix for missing stemming module. I am not sure if you can combine these
two opposite approaches successfully. I see the following drawbacks of
your solution.

Example:
built* (->built) could be changed to build* (no built, but ->builder,
building, etc.), and precision will go down drastically.

You probably use a stemmer with one important bug (a.k.a. feature) -
overstemming, so here is another example:
political* (->political, politically) is transformed to polic*
(->policer, policy, policies, policement etc.) by Porter alg., and the
precision is again affected drastically

-g-

[EMAIL PROTECTED] wrote:

>Your analyzers can optionally incorporate stemming, along with the other
>things that analyzers do (lowercasing, etc...).  The stemming algorithms
>are all different.  This "searcher" example was made up, but, there are
>instances where stemming at index time and not stemming wildcard searches
>will result in lost hits.  Specifically, we encountered this situation
>using the optional Snoball analyzers (which work great, by the way).
>
>DaveB
>
>
>
>
>

>                      Leo Galambos

>                      <[EMAIL PROTECTED]>        To:       Lucene Users List

>
<[EMAIL PROTECTED]>
>                      05/30/03 10:26 AM        cc:

>                      Please respond to        Subject:  Re: Lowercasing
wildcards - why?
>                      "Lucene Users

>                      List"

>

>

>
>
>
>
>I'm sorry, I did not read the complete thread. Do you mean - analyzer ==
>stemmer? Does it really work? If I was a stemmer, I would let "searche"
>intact. ;-)
>
>-g-
>
>[EMAIL PROTECTED] wrote:
>
>
>
>>Hi Les,
>>
>>We ended up modifying the QueryParser to pass prefix and suffix queries
>>through the Analyzer.  For us, it was about stemming.  If you decide to
>>
>>
>use
>
>
>>an analyzer that incorporated stemming, there are cases where wildcard
>>queries will not return the expected results.
>>
>>Example:  "searcher" will probably get stemmed to "search".  A search on
>>"searche*" should hit the term "searcher", but, it won't, all instances
of
>>"searcher" having been stemmed to "search" at index time.  Our solution
>>
>>
>was
>
>
>>to remove the trailing wildcard and send "searche" to the analyzer, then
>>tack the wildcard character back on there and create the PrefixQuery
>>
>>
>object
>
>
>>with the new search string "search*".
>>
>>DaveB
>>
>>
>>
>>
>>
>>
>>
>
>
>
>>                     Leslie Hughes
>>
>>
>
>
>
>>                     <[EMAIL PROTECTED]        To:
>>
>>
>"'[EMAIL PROTECTED]'"
>
>
>>                     ion.com.au>
>>
>>
><[EMAIL PROTECTED]>
>
>
>>                                                         cc:
>>
>>
>
>
>
>>                     05/30/03 01:09 AM                   Subject:
>>
>>
>Lowercasing wildcards - why?
>
>
>>                     Please respond to "Lucene
>>
>>
>
>
>
>>                     Users List"
>>
>>
>
>
>
>
>
>
>
>
>
>>
>>
>>Hi,
>>
>>I was just wondering what the rationale is behind lowercasing wildcard
>>queries produced by QueryParser? It's just that my data is all upper case
>>and my analyser doesn't lowercase so it seems a bit odd that I have to
>>
>>
>call
>
>
>>setLowercaseWildcardTerms(false). Couldn't queryparser leave the terms
>>unnormalised or better still pass them through the analyser?
>>
>>I'm sure there's a good reason for it though.....
>>
>>
>>Les
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>>
>>
>>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lowercasing wildcards - why?

Reply via email to