[ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724466#action_12724466 ]
Uwe Schindler commented on LUCENE-1461: --------------------------------------- I did some performance tests and compared this filter with TrieRange (precStep 8) on an 5 Mio index with homegenous distributed int values from Integer.MIN_VALUE to Integer.MAX_VALUE and 200 queries with random bounds in same range. Platform was Win32 with 1.5 GIG RAM on my Thinkpad T60 Core Duo (not 2 Duo!), Java 1.5: loading field cache time: 11826.602264 ms Warming searcher... avg number of terms: 414.365 TRIE: best time=4.51482 ms; worst time=1560.544985 ms; avg=470.56886981499997 ms; sum=323328111 FIELDCACHE: best time=314.611773 ms; worst time=878.438461 ms; avg=511.93189495499996 ms; sum=323328111 This test shows, that with a good warmed searcher and the whole index in OS cache is the same in speed. A constant score convential range query is far out (about 10 to 1000 times slower dependent on how far the random range bounds are away). The same with the old patch (using no TermDocs) and a completely separate loop (not matchDoc() method call), the FieldCache filter only hits the trie filter here: loading field cache time: 12134.143027 ms Warming searcher... avg number of terms: 403.785 TRIE: best time=3.890159 ms; worst time=1266.979462 ms; avg=453.553236545ms; sum=308154314 FIELDCACHE: best time=84.019897 ms; worst time=434.558023 ms; avg=235.91554798500002 ms; sum=308154314 Both test runs show, that the queries work correct (sum is identical, it shows that both returned exact the same hits). In all cases I would still prefer TrieRange (hihi), especially because of the long warming time for the field cache. And TrieRange gets even better with lower precSteps, but not really (in constant score mode the bits sets are the bigger problem) > Cached filter for a single term field > ------------------------------------- > > Key: LUCENE-1461 > URL: https://issues.apache.org/jira/browse/LUCENE-1461 > Project: Lucene - Java > Issue Type: New Feature > Reporter: Tim Sturge > Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: DisjointMultiFilter.java, FieldCacheRangeFilter.patch, > LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, > LUCENE-1461.patch, LUCENE-1461a.patch, LUCENE-1461b.patch, > LUCENE-1461c.patch, RangeMultiFilter.java, RangeMultiFilter.java, > TermMultiFilter.java, TestFieldCacheRangeFilter.patch > > > These classes implement inexpensive range filtering over a field containing a > single term. They do this by building an integer array of term numbers > (storing the term->number mapping in a TreeMap) and then implementing a fast > integer comparison based DocSetIdIterator. > This code is currently being used to do age range filtering, but could also > be used to do other date filtering or in any application where there need to > be multiple filters based on the same single term field. I have an untested > implementation of single term filtering and have considered but not yet > implemented term set filtering (useful for location based searches) as well. > The code here is fairly rough; it works but lacks javadocs and toString() and > hashCode() methods etc. I'm posting it here to discover if there is other > interest in this feature; I don't mind fixing it up but would hate to go to > the effort if it's not going to make it into Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org