[
https://issues.apache.org/jira/browse/LUCENE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14650499#comment-14650499
]
Robert Muir commented on LUCENE-6711:
-------------------------------------
IndexReader/Terms etc still document this as an optional statistic: I think we
should keep it that way. E.G. maybe its hard to compute for some FilterReader,
who knows.
So I think we should do a fallback like the other statistics: check for -1 and
use maxDoc if its unsupported.
But I think its a good time to make the change. For ordinary users, it will not
be trappy/happen incrementally: all these statistics have been supported since
4.0. We should fix TFIDFSimilarity and BM25Similarity too.
> Instead of docCount(), maxDoc() is used for numberOfDocuments in
> SimilarityBase
> -------------------------------------------------------------------------------
>
> Key: LUCENE-6711
> URL: https://issues.apache.org/jira/browse/LUCENE-6711
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/search
> Affects Versions: 5.2.1
> Reporter: Ahmet Arslan
> Priority: Minor
> Fix For: 5.3
>
> Attachments: LUCENE-6711.patch
>
>
> {{SimilarityBase.java}} has the following line :
> {code}
> long numberOfDocuments = collectionStats.maxDoc();
> {code}
> It seems like {{collectionStats.docCount()}}, which returns the total number
> of documents that have at least one term for this field, is more appropriate
> statistics here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]