[ 
https://issues.apache.org/jira/browse/LUCENE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192796#comment-13192796
 ] 

Robert Muir commented on LUCENE-3722:
-------------------------------------

just a simple example for what i meant about the -1 case:

Lets assume you have two shards, and one returns -1 for totalTermFreq().

If you were using BasicModelIF, which scores as:
{code}
tf_norm * log(1 + (maxdoc + 1)/(totalTermFreq + 0.5))
{code}

its far better to actually use -1 than a 'partial/incorrect' totalTermFreq,
because in that case the formula will fall back to totalTermFreq=docFreq...
it also must do this in case frequencies are omitted (omitTF), and for that
case the formula is still correct: but either way its falling back nicely to
IDF:

{code}
tf_norm * log(1 + (maxdoc + 1)/(docFreq + 0.5))
{code}

Yeah, i totally forgot about this being -1 in the omitTF case, so we should
still really think this summation through and make it easy to prevent mistakes,
because i gather omitTF isn't going anywhere... grrr
                
> make similarities/term/collectionstats take long (for > 2B docs)
> ----------------------------------------------------------------
>
>                 Key: LUCENE-3722
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3722
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-3722.patch, LUCENE-3722.patch, LUCENE-3722.patch
>
>
> As noted by Yonik and Andrzej on SOLR-1632, this would be useful for 
> distributed scoring.
> we can also add a sugar method add() to both of these to make it easier to 
> sum.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to