[
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648869#action_12648869
]
Tim Sturge commented on LUCENE-1461:
------------------------------------
Here's some benchmark data to demonstrate the utility. Results on a 45M
document index:
Firstly without an age constraint as a baseline:
Query "+name:tim"
startup: 0
Hits: 15089
first query: 1004
100 queries: 132 (1.32 msec per query)
Now with a cached filter. This is ideal from a speed standpoint but as with
most range based queries there are too many possible start/end combinations to
cache all the filters.
Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on cached RangeFilter)
startup: 3
Hits: 11156
first query: 1830
100 queries: 287 (2.87 msec per query)
Now with an uncached filter. This is awful.
Query "+name:tim age:[18 TO 35]" (uncached ConstantScoreRangeQuery)
startup: 3
Hits: 11156
first query: 1665
100 queries: 51862 (yes, 518 msec per query, 200x slower)
A RangeQuery is slightly better but still bad (and has a different result set)
Query "+name:tim age:[18 TO 35]" (uncached RangeQuery)
startup: 0
Hits: 10147
first query: 1517
100 queries: 27157 (271 msec is 100x slower than the filter)
Now with the prebuilt column stride filter:
Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on prebuilt column stride
filter)
startup: 2811
Hits: 11156
first query: 1395
100 queries: 441 (back down to 4.41msec per query)
This is less than 2x slower than the dedicated bitset and more than 50x faster
than the range boolean query.
> Cached filter for a single term field
> -------------------------------------
>
> Key: LUCENE-1461
> URL: https://issues.apache.org/jira/browse/LUCENE-1461
> Project: Lucene - Java
> Issue Type: New Feature
> Reporter: Tim Sturge
> Attachments: DisjointMultiFilter.java, RangeMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a
> single term. They do this by building an integer array of term numbers
> (storing the term->number mapping in a TreeMap) and then implementing a fast
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also
> be used to do other date filtering or in any application where there need to
> be multiple filters based on the same single term field. I have an untested
> implementation of single term filtering and have considered but not yet
> implemented term set filtering (useful for location based searches) as well.
> The code here is fairly rough; it works but lacks javadocs and toString() and
> hashCode() methods etc. I'm posting it here to discover if there is other
> interest in this feature; I don't mind fixing it up but would hate to go to
> the effort if it's not going to make it into Lucene.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]