[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Uwe Schindler (JIRA) Tue, 30 Jun 2009 01:05:15 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725518#action_12725518
 ]


Uwe Schindler commented on LUCENE-1461:
---------------------------------------

It seems that the latest patch has no problems anymore. Without deletions or if 
0 is not inside the range it seems to be faster than trie range, with the 
problem of long first-time searches (cache loading). But if you e.g. search on 
this field or use the cache for something other, it may not be a problem.

The biggest advantage of this is, that you do not need to index the values in a 
special way, you can simply use your old Number.toString() formatted fields and 
do range queries on them. For term/string ranges it works better than 
RangeFilter, but the memory usage is much higher (if you have lots of distinct 
terms) with StringIndex.

Maybe for the future, there would also be a possibility to implement a 
TrieRangeQuery for Strings (the precisionStep would there be the number of 
chars per precision, so e.g. precStep=2 would be to index for "lucene" the 
tokens "lu", "luce", "lucene"). The same here like with TrieRange: a 
TokenStream that does this would be good.

In my opinion, this class is a good approach for range queries, if you have 
enough RAM and warm your searchers correctly, but do not want to change you 
index structure to use the new TrieRange. This class is not good for indexes 
where you will hit only few documents per range, as the cost of the linear scan 
for all data types then overweight.

I think I will commit this later, if nobody objects. If you think, that only 
StringIndex and not numeric values should be handled by this class (throw away 
the new code), I tend to rename this class before release according to 
LUCENE-1713.

> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: DisjointMultiFilter.java, FieldCacheRangeFilter.patch, 
> LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, 
> LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461a.patch, 
> LUCENE-1461b.patch, LUCENE-1461c.patch, PerfTest.java, RangeMultiFilter.java, 
> RangeMultiFilter.java, TermMultiFilter.java, TestFieldCacheRangeFilter.patch
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Reply via email to