RE: New PhrasePrefixQuery.java

Anders Nielsen Wed, 20 Nov 2002 14:22:55 -0800

Hello Konrad & Doug,

How about something like this, where I've made two internal PhraseTerm and
PhraseWildcardTerm to include the different handling of the two kinds of
terms inside the PhraseQuery class.


The external interface of PhrasePrefixQuery has been changed so that
add(Term[] terms) becomes addWildcardTerm(Term term) as per Doug's
suggestion.

When scorer is called on the PhrasePrefixQuery class, the expanding of the
terms inside the PhraseWildcardTerm object takes place.

The PhrasePrefixQuery class is cleaned up a bit by this change, and
hopefully Konrad can add his highlighting code and interface with the
QueryParser.jj onto this.

Best Regards,
Anders Nielsen

-----Original Message-----
From: Konrad Scherer
To: Lucene Developers List
Sent: 20-11-2002 20:46
Subject: Re: New PhrasePrefixQuery.java


>
>I don't like extending Term.  An instance of a subclass should make
sense 
>anywhere its base class is, and that is not really the case here.   A 
>WildcardTerm should not in general be passed to IndexReader methods, 
>etc.  It looks like you've hacked around this, so that it won't
actually 
>crash, but this doesn't strike me as an appropriate use of subclassing.

I agree that it wasn't very elegant.

>I think it would be good to get this functionality into the Query 
>parser.  There is currently a gap between what is trivially available
in 
>the query parser (strings with wildcard characters) and the 
>PhrasePrefixQuery API (an array of terms).  What it seems to me is
needed 
>is just a utility method somewhere that expands a wildcarded string
into 
>an array of terms.  This is probably best done in 
>PhrasePrefixQuery.scorer, when an IndexReader is available.  So the 
>approach I would suggest is extending the API of PhrasePrefixQuery with
a 
>method like:
>   PhrasePrefixQuery.addTermPrefix(Term term);
>or
>   PhrasePrefixQuery.addWildcardTerm(Term term);
>where the term.text() contains either a term prefix or a wildcard 
>pattern.  Then, in the scorer() implementation this can be expanded. 
>PhrasePrefixQuery would then need to do some bookkeeping to identify
which 
>terms need expansion.
>
>Does this make sense?
Yes it makes sense, but there is a problem. To expand a wildcard, an 
IndexReader is necessary. I choose the prepare method because then the 
wildcard term can be expanded before the function sumOfSquaredWeights is

called. This function required the wildcard term already expanded. The 
relevant code follows:

Term[] terms = ((Term)o).getTerms();
for (int j=0; j<terms.length; j++) {
     _idf += searcher.getSimilarity().idf(terms[j], searcher);
}
I must admit to not understanding the weighting system at all == I
haven't 
taken the time to think about it yet. Is it necessary to have all the
terms 
for the weighting system to work? It would be strange to expand the 
wildcard within this function even if it were possible to retrieve an 
IndexReader from the IndexSearcher. If the math can be redone to avoid 
needing the expansion of the wildcard term then I will create a new
version 
of PhrasePrefixQuery that will expand the term within the scorer. That 
would do away with WildcardTerm (and changes to Term) entirely.
Thank you

Konrad


--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>

PhrasePrefixQuery.diff
Description: Binary data

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

RE: New PhrasePrefixQuery.java

Reply via email to