[
https://issues.apache.org/jira/browse/LUCENE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192796#comment-13192796
]
Robert Muir commented on LUCENE-3722:
-------------------------------------
just a simple example for what i meant about the -1 case:
Lets assume you have two shards, and one returns -1 for totalTermFreq().
If you were using BasicModelIF, which scores as:
{code}
tf_norm * log(1 + (maxdoc + 1)/(totalTermFreq + 0.5))
{code}
its far better to actually use -1 than a 'partial/incorrect' totalTermFreq,
because in that case the formula will fall back to totalTermFreq=docFreq...
it also must do this in case frequencies are omitted (omitTF), and for that
case the formula is still correct: but either way its falling back nicely to
IDF:
{code}
tf_norm * log(1 + (maxdoc + 1)/(docFreq + 0.5))
{code}
Yeah, i totally forgot about this being -1 in the omitTF case, so we should
still really think this summation through and make it easy to prevent mistakes,
because i gather omitTF isn't going anywhere... grrr
> make similarities/term/collectionstats take long (for > 2B docs)
> ----------------------------------------------------------------
>
> Key: LUCENE-3722
> URL: https://issues.apache.org/jira/browse/LUCENE-3722
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: 4.0
> Reporter: Robert Muir
> Attachments: LUCENE-3722.patch, LUCENE-3722.patch, LUCENE-3722.patch
>
>
> As noted by Yonik and Andrzej on SOLR-1632, this would be useful for
> distributed scoring.
> we can also add a sugar method add() to both of these to make it easier to
> sum.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]