[ https://issues.apache.org/jira/browse/LUCENE-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062363#comment-13062363 ]
Robert Muir commented on LUCENE-3290: ------------------------------------- Just more explanation, there are two parts to the patch: # FieldInvertState gets an additional variable, numUniqueTerms. its not stored anywhere. this just allows you to use this as part of your Similarity.computeNorm calculation, if you like. # in trunk *only* we store sumDocFreq, which changes the index format. but this is not easy to backport to 3.x, as fields are not clearly separated (which would make it a little tricky), and its missing new stats anyway like totalTermFreq (because it would bloat TermInfos). > add FieldInvertState.numUniqueTerms, Terms.sumDocFreq > ----------------------------------------------------- > > Key: LUCENE-3290 > URL: https://issues.apache.org/jira/browse/LUCENE-3290 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index > Reporter: Robert Muir > Assignee: Robert Muir > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3290.patch, LUCENE-3290.patch > > > For scoring systems like lnu.ltc > (http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf), we need to > supply 3 stats: > * average tf within d > * # of unique terms within d > * average number of unique terms across field > If we add FieldInvertState.numUniqueTerms, you can incorporate the first two > into your norms/docvalues (once we cut over), > the average tf within d being length / numUniqueTerms. > to compute the average across the field, we can just write the sum of all > terms' docfreqs into the terms dictionary header, > and you can then divide this by maxdoc to get the average. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org