[ https://issues.apache.org/jira/browse/LUCENE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698851#action_12698851 ]
Michael McCandless commented on LUCENE-1603: -------------------------------------------- I was thinking that this count is a good way to measure how much net work was done, hence the switch to sum. EG you could compare that count vs the count you get after having optimized the index to get a sense of how much you gained by optimizing. Whereas now, with the count only showing the # terms from the last segment searched, is not really useful at all. bq. Are queries also rewritten per segment with the new Searchers? If not, one could use the BooleanQuery variant, if he wants to have real term numbers on unoptimized index. They are rewritten at the MultiReader level, so you're right one could use that to get "number of unique terms" vs "amount of work (seeks) done". If we do change it, ow about "get/clearTotalNumberOfTerms()"? > Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement > ------------------------------------------------------------------------ > > Key: LUCENE-1603 > URL: https://issues.apache.org/jira/browse/LUCENE-1603 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 2.4, 2.9 > Reporter: Uwe Schindler > Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1603.patch > > > This is a patch, that is needed for the MultiTermQuery-rewrite of TrieRange > (LUCENE-1602): > - Make the private members protected, to have access to them from the very > special TrieRangeTermEnum > - Fix a small inconsistency (docFreq() now only returns a value, if a valid > term is existing) > - Improvement of MultiTermFilter.getDocIdSet to return > DocIdSet.EMPTY_DOCIDSET, if the TermEnum is empty (less memory usage) and > faster. > - Add the getLastNumberOfTerms() to MultiTermQuery for statistics on > different multi term queries and how may terms they affect, using this new > functionality, the improvement of TrieRange can be shown (extract from test > case there, 10000 docs index, long values): > {code} > [junit] Average number of terms during random search on 'field8': > [junit] Trie query: 244.2 > [junit] Classical query: 3136.94 > [junit] Average number of terms during random search on 'field4': > [junit] Trie query: 38.3 > [junit] Classical query: 3018.68 > [junit] Average number of terms during random search on 'field2': > [junit] Trie query: 18.04 > [junit] Classical query: 3539.42 > {code} > All core tests pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org