I don't like extending Term. An instance of a subclass should make sense anywhere its base class is, and that is not really the case here. A WildcardTerm should not in general be passed to IndexReader methods, etc. It looks like you've hacked around this, so that it won't actually crash, but this doesn't strike me as an appropriate use of subclassing.
I agree that it wasn't very elegant.
I think it would be good to get this functionality into the Query parser. There is currently a gap between what is trivially available in the query parser (strings with wildcard characters) and the PhrasePrefixQuery API (an array of terms). What it seems to me is needed is just a utility method somewhere that expands a wildcarded string into an array of terms. This is probably best done in PhrasePrefixQuery.scorer, when an IndexReader is available. So the approach I would suggest is extending the API of PhrasePrefixQuery with a method like:Yes it makes sense, but there is a problem. To expand a wildcard, an IndexReader is necessary. I choose the prepare method because then the wildcard term can be expanded before the function sumOfSquaredWeights is called. This function required the wildcard term already expanded. The relevant code follows:
PhrasePrefixQuery.addTermPrefix(Term term);
or
PhrasePrefixQuery.addWildcardTerm(Term term);
where the term.text() contains either a term prefix or a wildcard pattern. Then, in the scorer() implementation this can be expanded. PhrasePrefixQuery would then need to do some bookkeeping to identify which terms need expansion.
Does this make sense?
Term[] terms = ((Term)o).getTerms();
for (int j=0; j<terms.length; j++) {
_idf += searcher.getSimilarity().idf(terms[j], searcher);
}
I must admit to not understanding the weighting system at all == I haven't taken the time to think about it yet. Is it necessary to have all the terms for the weighting system to work? It would be strange to expand the wildcard within this function even if it were possible to retrieve an IndexReader from the IndexSearcher. If the math can be redone to avoid needing the expansion of the wildcard term then I will create a new version of PhrasePrefixQuery that will expand the term within the scorer. That would do away with WildcardTerm (and changes to Term) entirely.
Thank you
Konrad
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
