[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Tim Sturge (JIRA) Tue, 18 Nov 2008 17:28:36 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648869#action_12648869
 ]


Tim Sturge commented on LUCENE-1461:
------------------------------------

Here's some benchmark data to demonstrate the utility. Results on a 45M 
document index:

Firstly without an age constraint as a baseline:

Query "+name:tim" 
startup: 0 
Hits: 15089
first query: 1004
100 queries: 132 (1.32 msec per query)

Now with a cached filter. This is ideal from a speed standpoint but as with 
most range based queries there are too many possible start/end combinations to 
cache all the filters.

Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on cached RangeFilter)
startup: 3
Hits: 11156
first query: 1830
100 queries: 287 (2.87 msec per query)

Now with an uncached filter. This is awful.

Query "+name:tim age:[18 TO 35]" (uncached ConstantScoreRangeQuery)
startup: 3
Hits: 11156
first query: 1665
100 queries: 51862 (yes, 518 msec per query, 200x slower)

A RangeQuery is slightly better but still bad (and has a different result set)

Query "+name:tim age:[18 TO 35]" (uncached RangeQuery)
startup: 0
Hits: 10147
first query: 1517
100 queries: 27157 (271 msec is 100x slower than the filter)

Now with the prebuilt column stride filter:

Query "+name:tim age:[18 TO 35]" (ConstantScoreQuery on prebuilt column stride 
filter)
startup: 2811
Hits: 11156
first query: 1395
100 queries: 441 (back down to 4.41msec per query)

This is less than 2x slower than the dedicated bitset and more than 50x faster 
than the range boolean query.



> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>         Attachments: DisjointMultiFilter.java, RangeMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Reply via email to