[jira] Commented: (LUCENE-1603) Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement

Michael McCandless (JIRA) Tue, 14 Apr 2009 10:49:39 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698851#action_12698851
 ]


Michael McCandless commented on LUCENE-1603:
--------------------------------------------

I was thinking that this count is a good way to measure how much net work was 
done, hence the switch to sum.  EG you could compare that count vs the count 
you get after having optimized the index to get a sense of how much you gained 
by optimizing.

Whereas now, with the count only showing the # terms from the last segment 
searched, is not really useful at all.

bq. Are queries also rewritten per segment with the new Searchers? If not, one 
could use the BooleanQuery variant, if he wants to have real term numbers on 
unoptimized index.

They are rewritten at the MultiReader level, so you're right one could use that 
to get "number of unique terms" vs "amount of work (seeks) done".

If we do change it, ow about "get/clearTotalNumberOfTerms()"?

> Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1603
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1603
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4, 2.9
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1603.patch
>
>
> This is a patch, that is needed for the MultiTermQuery-rewrite of TrieRange 
> (LUCENE-1602):
> - Make the private members protected, to have access to them from the very 
> special TrieRangeTermEnum 
> - Fix a small inconsistency (docFreq() now only returns a value, if a valid 
> term is existing)
> - Improvement of MultiTermFilter.getDocIdSet to return 
> DocIdSet.EMPTY_DOCIDSET, if the TermEnum is empty (less memory usage) and 
> faster.
> - Add the getLastNumberOfTerms() to MultiTermQuery for statistics on 
> different multi term queries and how may terms they affect, using this new 
> functionality, the improvement of TrieRange can be shown (extract from test 
> case there, 10000 docs index, long values):
> {code}
> [junit] Average number of terms during random search on 'field8':
> [junit]  Trie query: 244.2
> [junit]  Classical query: 3136.94
> [junit] Average number of terms during random search on 'field4':
> [junit]  Trie query: 38.3
> [junit]  Classical query: 3018.68
> [junit] Average number of terms during random search on 'field2':
> [junit]  Trie query: 18.04
> [junit]  Classical query: 3539.42
> {code}
> All core tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1603) Changes for TrieRange in FilteredTermEnum and MultiTermQuery improvement

Reply via email to