Re: MultiTermQuery question

Doug Cutting Tue, 25 Feb 2003 09:37:17 -0800

none none wrote:

On Mon, 24 Feb 2003 10:04:30 Doug Cutting wrote:
Perhaps MultiTermQuery.getEnum() should be changed from protected to private. Would that work for you?
i don't know, i guess so, i believe it should be public i need to call it from the lucene highlighter. So what i'll have to do is: -getEnum() -iterate while FilteredTermEnum.next() is true -call getTerm() to get the current Term. -add the term.text() in a vector -end loop -use this vector of text-terms inside the highlighter tool Am i Right? if so, do you think it will be slower than before?

Yes, that looks right, and no it should be no slower than before.

Perhaps this should be added as a method to MultiTermQuery, something like:

public Term[] getMatchingTerms(IndexReader);

The important thing is that it is parameterized by an IndexReader, which the old getQuery() method was not.

This is actually similar to what Tatu was proposing with his Query.collectTerms() API. However with a prefix, wildcard or other expanded query term, the set of terms is only defined in the context of an IndexReader.

So, Tatu, if you do get to implementing your proposal, please take this into account. There are potentially two different things that folks might want: (1) the list of terms which are literally in the query, e.g., "foo*" for a wildcard query; (2) the list of indexed terms which match clauses in the query, e.g., "fool" and "foosball" for the query "foo*". The latter is considerably more expensive to compute, but might be more useful in term highlighting. (Note that you could do term highliting without this, by matching terms in the text directly to the wildcarded pattern.)

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: MultiTermQuery question

Reply via email to