[
https://issues.apache.org/jira/browse/LUCENE-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419860#comment-13419860
]
Michael McCandless commented on LUCENE-4242:
--------------------------------------------
bq. liveDocsRatio will be the same for every term - if you want to take into
account deleted docs, then just do it when you set maxTermDocFreq.
Good! I'll move it up front.
bq. Also, this code comes from Solr
True, but it matters not where the code came from :)
I think what you meant to say is "where Solr invokes DocTermOrds ...". So then
the question is what contract should we have (should caller be expected to
pro-rate by deletes themselves, or should DocTermOrds do so (= this patch)).
bq. where maxTermDocFreq is set as a percentage of maxDoc - so things are
already scaled by the number of deleted docs.
Wait, how is that taking deleted docs into account? maxDoc doesn't reflect
deletions. Looks to me like deleted docs are not factored in now by Solr,
either?
> UnInverted cache uses term freq to filter out terms (but deleted docs are
> included in the freq count)
> -----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-4242
> URL: https://issues.apache.org/jira/browse/LUCENE-4242
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0
> Reporter: roman
> Priority: Minor
> Attachments: LUCENE-4242.patch, LUCENE-4242.patch, LUCENE-4242.patch
>
>
> TermEnum.docFreq() count is used to compute uninverted index
> (DocTermOrds.uninvert()). The code goes like:
> final int df = te.docFreq();
> if (df <= maxTermDocFreq) {
> So, if there are deleted documents in the index and maxTermDocFreq is
> low, then the term will be excluded (even if the freq of the livedocs
> is OK). Most likely, the cache will be incomplete.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]