[
https://issues.apache.org/jira/browse/LUCENE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477112#comment-13477112
]
Robert Muir commented on LUCENE-4485:
-------------------------------------
+1
> CheckIndex's term stats should not include deleted docs
> -------------------------------------------------------
>
> Key: LUCENE-4485
> URL: https://issues.apache.org/jira/browse/LUCENE-4485
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-4485.patch
>
>
> I was looking at the CheckIndex output on and index that has deletions, eg:
> {noformat}
> 4 of 30: name=_90 docCount=588408
> codec=Lucene41
> compound=false
> numFiles=14
> size (MB)=265.318
> diagnostics = {os=Linux, os.version=3.2.0-23-generic, mergeFactor=10,
> source=merge, lucene.version=5.0-SNAPSHOT, os.arch=amd64,
> mergeMaxNumSegments=-1, java.version=1.7.0_07, java.vendor=Oracle Corporation}
> has deletions [delGen=1]
> test: open reader.........OK [39351 deleted docs]
> test: fields..............OK [8 fields]
> test: field norms.........OK [2 fields]
> test: terms, freq, prox...OK [4910342 terms; 61319238 terms/docs pairs;
> 65597188 tokens]
> test (ignoring deletes): terms, freq, prox...OK [4910342 terms; 61319238
> terms/docs pairs; 70293065 tokens]
> test: stored fields.......OK [1647171 total field count; avg 3 fields per
> doc]
> test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
> test: docvalues...........OK [0 total doc count; 1 docvalues fields]
> {noformat}
> If you compare the {{test: terms, freq, prox}} (includes deletions) and the
> next line (doesn't include deletions), it's confusing because only the 3rd
> number (tokens) reflects deletions. I think the first two numbers should
> also reflect deletions? This way an app could get a sense of how much
> "deadweight" is in the index due to un-reclaimed deletions...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]