Hi Michael, Thanks for the explanation. I am working with a TREC dataset, since it is static, I set size of that array experimentally.
I followed the DefaultSimilarity#lengthNorm method a bit. If default similarity and no index time boost is used, I assume that norm equals to 1.0 / Math.sqrt(numTerms). First option is somehow obtain pre-computed norm value and apply reverse operation to obtain numTerms. numTerms = (1/norm)^2 This will be an approximation because norms are stored in a byte. How do I access that norm value for a given docid and a field? Second option, I store numTerms as a separate field, like any other organic fields. Do I need to calculate it by myself? Or can I access above already computed numTerms value during indexing? I think I will follow second option. Is there a pointer where reading/writing a DocValues based field example is demostrated? Thanks, Ahmet On Friday, February 6, 2015 11:08 AM, Michael McCandless <[email protected]> wrote: How will you know how large to allocate that array? The within-doc term freq can in general be arbitrarily large... Lucene does not directly store the total number of terms in a document, but it does store it approximately in the doc's norm value. Maybe you can use that? Alternatively, you can store this statistic yourself, e.g as a doc value. Mike McCandless http://blog.mikemccandless.com On Thu, Feb 5, 2015 at 7:24 PM, Ahmet Arslan <[email protected]> wrote: > Hello Lucene Users, > > I am traversing all documents that contains a given term with following code : > > Term term = new Term(field, word); > Bits bits = MultiFields.getLiveDocs(reader); > DocsEnum docsEnum = MultiFields.getTermDocsEnum(reader, bits, field, > term.bytes()); > > while (docsEnum.nextDoc() != DocsEnum.NO_MORE_DOCS) { > > array[docsEnum.freq()]++; > > // how to retrieve term count for this document? > xxxxx(docsEnum.docID(), field); > > > } > > How can I get field term count values for these documents using Lucene 4.10.3? > > Is above code OK for traversing posting list of term? > > Thanks, > Ahmet > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
