Hello Konrad & Doug, How about something like this, where I've made two internal PhraseTerm and PhraseWildcardTerm to include the different handling of the two kinds of terms inside the PhraseQuery class.
The external interface of PhrasePrefixQuery has been changed so that
add(Term[] terms) becomes addWildcardTerm(Term term) as per Doug's
suggestion.
When scorer is called on the PhrasePrefixQuery class, the expanding of the
terms inside the PhraseWildcardTerm object takes place.
The PhrasePrefixQuery class is cleaned up a bit by this change, and
hopefully Konrad can add his highlighting code and interface with the
QueryParser.jj onto this.
Best Regards,
Anders Nielsen
-----Original Message-----
From: Konrad Scherer
To: Lucene Developers List
Sent: 20-11-2002 20:46
Subject: Re: New PhrasePrefixQuery.java
>
>I don't like extending Term. An instance of a subclass should make
sense
>anywhere its base class is, and that is not really the case here. A
>WildcardTerm should not in general be passed to IndexReader methods,
>etc. It looks like you've hacked around this, so that it won't
actually
>crash, but this doesn't strike me as an appropriate use of subclassing.
I agree that it wasn't very elegant.
>I think it would be good to get this functionality into the Query
>parser. There is currently a gap between what is trivially available
in
>the query parser (strings with wildcard characters) and the
>PhrasePrefixQuery API (an array of terms). What it seems to me is
needed
>is just a utility method somewhere that expands a wildcarded string
into
>an array of terms. This is probably best done in
>PhrasePrefixQuery.scorer, when an IndexReader is available. So the
>approach I would suggest is extending the API of PhrasePrefixQuery with
a
>method like:
> PhrasePrefixQuery.addTermPrefix(Term term);
>or
> PhrasePrefixQuery.addWildcardTerm(Term term);
>where the term.text() contains either a term prefix or a wildcard
>pattern. Then, in the scorer() implementation this can be expanded.
>PhrasePrefixQuery would then need to do some bookkeeping to identify
which
>terms need expansion.
>
>Does this make sense?
Yes it makes sense, but there is a problem. To expand a wildcard, an
IndexReader is necessary. I choose the prepare method because then the
wildcard term can be expanded before the function sumOfSquaredWeights is
called. This function required the wildcard term already expanded. The
relevant code follows:
Term[] terms = ((Term)o).getTerms();
for (int j=0; j<terms.length; j++) {
_idf += searcher.getSimilarity().idf(terms[j], searcher);
}
I must admit to not understanding the weighting system at all == I
haven't
taken the time to think about it yet. Is it necessary to have all the
terms
for the weighting system to work? It would be strange to expand the
wildcard within this function even if it were possible to retrieve an
IndexReader from the IndexSearcher. If the math can be redone to avoid
needing the expansion of the wildcard term then I will create a new
version
of PhrasePrefixQuery that will expand the term within the scorer. That
would do away with WildcardTerm (and changes to Term) entirely.
Thank you
Konrad
--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>
PhrasePrefixQuery.diff
Description: Binary data
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
