add FieldInvertState.numUniqueTerms, Terms.sumDocFreq
-----------------------------------------------------

                 Key: LUCENE-3290
                 URL: https://issues.apache.org/jira/browse/LUCENE-3290
             Project: Lucene - Java
          Issue Type: Improvement
          Components: core/index
            Reporter: Robert Muir
            Assignee: Robert Muir
             Fix For: 4.0


For scoring systems like lnu.ltc 
(http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to 
supply 3 stats:
* average tf within d
* # of unique terms within d
* average number of unique terms across field

If we add FieldInvertState.numUniqueTerms, you can incorporate the first two 
into your norms/docvalues (once we cut over),
the average tf within d being length / numUniqueTerms.

to compute the average across the field, we can just write the sum of all 
terms' docfreqs into the terms dictionary header,
and you can then divide this by maxdoc to get the average.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to