Term Statistics for MultiTermQuery

Carsten Schnober Tue, 12 Mar 2013 10:29:39 -0700

Hi,
here's another question involving MultiTermQuerys. My aim is to get a
frequency count for a MultiTermQuery while I don't need to execute the
query. The naive approach would be to create the Query, extract the
terms, and get each term's frequency, approximately as follows:


IndexSearcher searcher = ...;
PrefixQuery query = new PrefixQuery(new Term("field", "abc"));
Query rewritten = searcher.rewrite(query);
Set<Term> terms = rewritten.extractTerms();
...

And eventually read the term frequencies for each term. However, this
seems rather costly for a large number of terms and I am actually
interested in the total frequencies, so there would be no need for a
term-by-term analysis.
My use case is that I have an index containing part-of-speech tags in
the form <tag>:<token> and I may be searching for <tag> frequencies.
My alternative solution would be to create a dedicated index in which
the original tokens are completely replaced by the tags, so that I had
documents in the form "DET NN ..." and corresponding tokens. Would you
rather recommend this?

Thanks,
Carsten


-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Term Statistics for MultiTermQuery

Reply via email to