Hi,

> Hi,
> I'm trying to get the terms that match a certain RegexpQuery. My (naive)
> approach:
> 
> 1. Create a RegexpQuery from the queryString (e.g. "abc.*"):
> Query q = new RegexpQuery(new Term("text", queryString));
> 
> 2. Rewrite the Query using the IndexReader reader:
> q = q.rewrite(reader);

This works for this query, but in general you have to rewrite until it is 
completely rewritten: A while loop that exits when the result of the rewrite is 
identical to the original query. IndexSearcher.rewrite() does this for you. 

> 3. Write the terms into a previously initialized empty set terms:
> Set<Term> terms = new HashSet<>();
> q.extractTerms(terms);

Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE, then this should 
work (after rewrite your query is a BooleanQuery, which supports 
extractTerms()).

> However, this results in an empty set. I believe this is due to the fact that 
> the
> rewritten query is a ConstantScoreQuery object;
> q.extractTerms(terms) does not yield any terms anyway. q.getQuery()
> returns null however; according to the documentation, this should happen
> when it wraps a filter which it does not, supposedly.

It wraps a filter: MultiTermQueryWrapperFilter

> This is Lucene 4.0. Any hints?
> Thanks!
> Carsten
> 
> 
> --
> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> Projekt KorAP                 | http://korap.ids-mannheim.de
> Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
> Korpusanalyseplattform der nächsten Generation Next Generation Corpus
> Analysis Platform
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to