Thanks but TermsEnum has two methods that returns frequency-related
info, both are corpus-level, not document specific:
-docFreq() Returns the number of documents containing the current term.
-totalTermFreq() Returns the total number of occurrences of this term
across all documents (the sum of the freq() for each doc that has this
term).
However I will need document specific frequency, i.e., freq of term A in
Doc 1, 2, ... N
Thanks
On 20/09/2015 15:07, Uwe Schindler wrote:
Hi,
With the terms enum you can iterate over all terms. Each one returns its term
frequency. Of course, you need to enable term vectors during indexing. The
pattern how to use terms enum can be looked up at various places in Lucene
source code. It's a very expert API but it is the way to go here.
Uwe
Am 20. September 2015 15:35:40 MESZ, schrieb Ziqi Zhang
<ziqi.zh...@sheffield.ac.uk>:
Hi
Is it possible to get a list of terms within a document, and also TF of
each of these terms *in that document only*? (Lucene 5.3)
IndexReader has a method "Terms getTermVector(int docID, String
field)",
which gives me a "Terms" object, on which I can get a TermsEnum. But I
do not know where to go then.
thanks
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de
--
Ziqi Zhang
Research Associate
Department of Computer Science
University of Sheffield
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org