[ 
https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3473:
--------------------------------

    Attachment: LUCENE-3473.patch

Patch adding the checks to checkindex.

there were some problems:
* IndexReader.getUniqueTermCount doesn't work in trunk, but works fine in 3.x. 
This is because it sums per-field across the Terms api, but PreFlex codec 
doesn't know this information per-field
* If a field has no postings (but exists in fieldinfos), then 
IR.getUniqueTermCount hits an NPE (ant test-core -Dtestcase=TestNorms 
-Dtestmethod=testCustomEncoder 
-Dtests.seed=-6a2248fc7313e45:c41a685f840f6ed:-5a3fd5b8ec315508)
* MemoryCodec didn't implement Fields.getUniqueTermCount, probably just 
forgotten because its not abstract (instead throwing UOE by default).

So, i fixed MemoryCodec to impl Terms.getUniqueTermCount, changed 
Terms.getUniqueTermCount to be abstract (throw -1 if you cannot implement it), 
and added Fields.getUniqueTermCount, called by IR.getUniqueTermCount: default 
implementation sums across fields, but PreFlex overrides so that its 
IR.getUniqueTermCount works again.

we might want to deprecate the latter method when 3.x indexes no longer need to 
be supported, or maybe its just fine as-is (you have to do the summing 
somewhere).
                
> CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms
> -------------------------------------------------------------------
>
>                 Key: LUCENE-3473
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3473
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.4, 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-3473.patch
>
>
> Just glancing at the code it seems to sorta do this check, but only in the 
> hasOrd==true case maybe (which seems to be testing something else)?
> It would be nice to verify this also for terms dicts that dont support ord.
> we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and 
> preflex

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to