Hi, > Hi, > I'm trying to get the terms that match a certain RegexpQuery. My (naive) > approach: > > 1. Create a RegexpQuery from the queryString (e.g. "abc.*"): > Query q = new RegexpQuery(new Term("text", queryString)); > > 2. Rewrite the Query using the IndexReader reader: > q = q.rewrite(reader);
This works for this query, but in general you have to rewrite until it is completely rewritten: A while loop that exits when the result of the rewrite is identical to the original query. IndexSearcher.rewrite() does this for you. > 3. Write the terms into a previously initialized empty set terms: > Set<Term> terms = new HashSet<>(); > q.extractTerms(terms); Set the rewrite method to e.g. SCORING_BOOLEAN_QUERY_REWRITE, then this should work (after rewrite your query is a BooleanQuery, which supports extractTerms()). > However, this results in an empty set. I believe this is due to the fact that > the > rewritten query is a ConstantScoreQuery object; > q.extractTerms(terms) does not yield any terms anyway. q.getQuery() > returns null however; according to the documentation, this should happen > when it wraps a filter which it does not, supposedly. It wraps a filter: MultiTermQueryWrapperFilter > This is Lucene 4.0. Any hints? > Thanks! > Carsten > > > -- > Institut für Deutsche Sprache | http://www.ids-mannheim.de > Projekt KorAP | http://korap.ids-mannheim.de > Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de > Korpusanalyseplattform der nächsten Generation Next Generation Corpus > Analysis Platform > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org