[ 
https://issues.apache.org/jira/browse/LUCENE-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062363#comment-13062363
 ] 

Robert Muir commented on LUCENE-3290:
-------------------------------------

Just more explanation, there are two parts to the patch:
# FieldInvertState gets an additional variable, numUniqueTerms. its not stored 
anywhere. this just allows you to use this as part of your 
Similarity.computeNorm calculation, if you like.
# in trunk *only* we store sumDocFreq, which changes the index format. but this 
is not easy to backport to 3.x, as fields are not clearly separated (which 
would make it a little tricky), and its missing new stats anyway like 
totalTermFreq (because it would bloat TermInfos).


> add FieldInvertState.numUniqueTerms, Terms.sumDocFreq
> -----------------------------------------------------
>
>                 Key: LUCENE-3290
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3290
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.4, 4.0
>
>         Attachments: LUCENE-3290.patch, LUCENE-3290.patch
>
>
> For scoring systems like lnu.ltc 
> (http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to 
> supply 3 stats:
> * average tf within d
> * # of unique terms within d
> * average number of unique terms across field
> If we add FieldInvertState.numUniqueTerms, you can incorporate the first two 
> into your norms/docvalues (once we cut over),
> the average tf within d being length / numUniqueTerms.
> to compute the average across the field, we can just write the sum of all 
> terms' docfreqs into the terms dictionary header,
> and you can then divide this by maxdoc to get the average.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to