[ https://issues.apache.org/jira/browse/LUCENE-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-3473: -------------------------------- Attachment: LUCENE-3473.patch Patch adding the checks to checkindex. there were some problems: * IndexReader.getUniqueTermCount doesn't work in trunk, but works fine in 3.x. This is because it sums per-field across the Terms api, but PreFlex codec doesn't know this information per-field * If a field has no postings (but exists in fieldinfos), then IR.getUniqueTermCount hits an NPE (ant test-core -Dtestcase=TestNorms -Dtestmethod=testCustomEncoder -Dtests.seed=-6a2248fc7313e45:c41a685f840f6ed:-5a3fd5b8ec315508) * MemoryCodec didn't implement Fields.getUniqueTermCount, probably just forgotten because its not abstract (instead throwing UOE by default). So, i fixed MemoryCodec to impl Terms.getUniqueTermCount, changed Terms.getUniqueTermCount to be abstract (throw -1 if you cannot implement it), and added Fields.getUniqueTermCount, called by IR.getUniqueTermCount: default implementation sums across fields, but PreFlex overrides so that its IR.getUniqueTermCount works again. we might want to deprecate the latter method when 3.x indexes no longer need to be supported, or maybe its just fine as-is (you have to do the summing somewhere). > CheckIndex should verify numUniqueTerms == recomputedNumUniqueTerms > ------------------------------------------------------------------- > > Key: LUCENE-3473 > URL: https://issues.apache.org/jira/browse/LUCENE-3473 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 3.4, 4.0 > Reporter: Robert Muir > Attachments: LUCENE-3473.patch > > > Just glancing at the code it seems to sorta do this check, but only in the > hasOrd==true case maybe (which seems to be testing something else)? > It would be nice to verify this also for terms dicts that dont support ord. > we should add explicit checks per-field in 4.x, and for-all-fields in 3.x and > preflex -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org