I think we have here different problems:

Carsten wants to just collect the terms a MTQ visits, so using BooleanQuery to 
do this is fine, unless you hit the limit. If you don’t execute the query, the 
limit can be as high as possible (but it’s a static limit affecting all 
instances). To do the same you can use another approach: Implement your own 
TermCollectingRewrite subclass, that simply adds a terms collected into a 
custom HashSet or whatever. You just have to implement the addClause and 
getTopLevelQuery methods in TermCollectingRewrite and return the set later 
(just use a "fake" query as holder for the HashSet). I did something similar in 
the past to implement a MultiPhraseQuery with MTQs like wildcards, regexes or 
fuzzys as clauses (I hope, I can donate it soon). The custom rewrite would be 
the most efficient way to get the list of terms (if you rely on a query as 
input).

On the other hand, to collect all terms for a wildcard, don’t use the Query at 
all, just wrap the reader's TermsEnum using one of the classes from the search 
package, like AutomatonTermsEnum (which takes a regex in its ctor) and filters 
all terms in the index according to the automaton (which may be a regex).

Finally, if you actually want to execute the query, using a scoring rewrite is 
a bad idea and 1024 is too large, too.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Monday, March 11, 2013 6:23 PM
> To: java-user@lucene.apache.org
> Subject: Re: Rewrite for RegexpQuery
> 
> On Mon, Mar 11, 2013 at 9:32 AM, Carsten Schnober <schnober@ids-
> mannheim.de> wrote:
> > Am 11.03.2013 13:38, schrieb Michael McCandless:
> >> On Mon, Mar 11, 2013 at 7:08 AM, Uwe Schindler <u...@thetaphi.de>
> wrote:
> >>
> >>> Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE,
> then this should work (after rewrite your query is a BooleanQuery, which
> supports extractTerms()).
> >>
> >> ... as long as you don't exceed the max number of terms allowed by BQ
> >> (1024 by default, but you can raise it).
> >
> > True, I've noticed this meanwhile. Are there any recommendations for
> > this setting where the limit is as large as possible while staying
> > within a reasonable performance? Of course, this is highly subjective,
> > but what's the magnitude here? Will a limit of 1,024,000 typically
> > increase the query time by the factor 1,000 too?
> > Carsten
> 
> I think 1024 may already be too high ;)
> 
> But really it depends on your situation: test different limits and see.
> 
> How much slower a larger query is depends on the specifics of the terms ...
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to