add FieldInvertState.numUniqueTerms, Terms.sumDocFreq -----------------------------------------------------
Key: LUCENE-3290 URL: https://issues.apache.org/jira/browse/LUCENE-3290 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 For scoring systems like lnu.ltc (http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to supply 3 stats: * average tf within d * # of unique terms within d * average number of unique terms across field If we add FieldInvertState.numUniqueTerms, you can incorporate the first two into your norms/docvalues (once we cut over), the average tf within d being length / numUniqueTerms. to compute the average across the field, we can just write the sum of all terms' docfreqs into the terms dictionary header, and you can then divide this by maxdoc to get the average. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org