[jira] [Commented] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase

Robert Muir (JIRA) Sat, 01 Aug 2015 11:50:34 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650499#comment-14650499
 ]


Robert Muir commented on LUCENE-6711:
-------------------------------------

IndexReader/Terms etc still document this as an optional statistic: I think we 
should keep it that way. E.G. maybe its hard to compute for some FilterReader, 
who knows.

So I think we should do a fallback like the other statistics: check for -1 and 
use maxDoc if its unsupported.

But I think its a good time to make the change. For ordinary users, it will not 
be trappy/happen incrementally: all these statistics have been supported since 
4.0. We should fix TFIDFSimilarity and BM25Similarity too.

> Instead of docCount(), maxDoc() is used for numberOfDocuments in 
> SimilarityBase
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-6711
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6711
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 5.2.1
>            Reporter: Ahmet Arslan
>            Priority: Minor
>             Fix For: 5.3
>
>         Attachments: LUCENE-6711.patch
>
>
> {{SimilarityBase.java}} has the following line :
> {code}
>  long numberOfDocuments = collectionStats.maxDoc();
> {code}
> It seems like {{collectionStats.docCount()}}, which returns the total number 
> of documents that have at least one term for this field, is more appropriate 
> statistics here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6711) Instead of docCount(), maxDoc() is used for numberOfDocuments in SimilarityBase

Reply via email to