Konrad Scherer wrote:
I have modified QueryParser.jj and PhrasePrefixQuery.java to allow wildcard searches within phrases. This turned out to be a very involved change going through a few revisions. I have tried to make the changes as clean as possible.
Thanks for taking the time to work on this.  I hope your patience continues.

>Some points
1) I created a WildcardTerm class which extends Term. Originally Term was final. My changes shouldn't affect anyone unless there is a reason Term must remain final which I have not noticed.
I don't like extending Term. An instance of a subclass should make sense anywhere its base class is, and that is not really the case here. A WildcardTerm should not in general be passed to IndexReader methods, etc. It looks like you've hacked around this, so that it won't actually crash, but this doesn't strike me as an appropriate use of subclassing.

2) PhrasePrefixQuery.java has been completely rewritten.
And you added meaningful comments! Bravo!

> Extending Term
> helped simplify this class considerably. A PhrasePrefixQuery is now a
> vector of Terms (or WildcardTerms). The wildcard Terms are expanded
> though the prepare() call from Query.

Unfortunately, the prepare() method has not proven to be a great way to do things. The problem is that, with MultiSearcher, it is called multiple times, once for each underlying IndexReader that is searched. If, for example, MultiSearcher spawned a thread to search each of the sub-indexes, then when prepare() is called in each thread it would modify the terms in the query in different ways, and they would conflict. You could add some synchronization code into MultiTermQuery, but it's really better if all query invocation state is either on the stack or in the Scorer. I think just about every use of prepare() has resulted in a bug. Long term, I this method should probably be removed.

The previous implementation managed without using prepare.

I think it would be good to get this functionality into the Query parser. There is currently a gap between what is trivially available in the query parser (strings with wildcard characters) and the PhrasePrefixQuery API (an array of terms). What it seems to me is needed is just a utility method somewhere that expands a wildcarded string into an array of terms. This is probably best done in PhrasePrefixQuery.scorer, when an IndexReader is available. So the approach I would suggest is extending the API of PhrasePrefixQuery with a method like:
PhrasePrefixQuery.addTermPrefix(Term term);
or
PhrasePrefixQuery.addWildcardTerm(Term term);
where the term.text() contains either a term prefix or a wildcard pattern. Then, in the scorer() implementation this can be expanded. PhrasePrefixQuery would then need to do some bookkeeping to identify which terms need expansion.

Does this make sense?

Doug



--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to