[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Earwin Burrfoot (JIRA) Tue, 25 Nov 2008 23:29:37 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650882#action_12650882
 ]


Earwin Burrfoot commented on LUCENE-1461:
-----------------------------------------

Somewhat off topic, but nonetheless, my two techniques for superfast range 
queries/filters:
1. cache [from, null]+[null, to] filters instead of [from, to] and intersect 
them
-> can tremendously improve cache hits for certain setups

2. when indexing a field that will be used for range filter, index 
lower-resolution versions of it additionally, than use a union of rangefilters 
over different resolution fields, ie:
a. we have severalM documents with a date field spanning few years with say 
minute precision (we'd like to sort on it afterward)
b. we index additional fields with dates rounded down to something like years, 
months, days, hours (best combination depends on width of the queries you're 
most likely to perform, let's say it's day+hour for queries rarely spanning 
more than a month)
c. we have a query like [2008-05-05 18:00 .. 2008-06-01 10:53], it is converted 
to -> hour:[05-05 18 .. 05-06 00) or day:[05-06 .. 06-01) or hour:[06-01 00 .. 
06-01 10) or minute:[06-01 10:00 .. 06-01 10:53]
-> massive win for ranges over fields having lots of high-selectivity terms, 
with timestamps being a good example, also salaries, coordinates, whatever

> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>         Attachments: DisjointMultiFilter.java, LUCENE-1461a.patch, 
> LUCENE-1461b.patch, RangeMultiFilter.java, RangeMultiFilter.java, 
> TermMultiFilter.java
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Reply via email to