[jira] Updated: (LUCENE-1461) Cached filter for a single term field

Uwe Schindler (JIRA) Fri, 26 Jun 2009 15:24:11 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated LUCENE-1461:
----------------------------------

    Attachment: LUCENE-1461.patch

Attached is a new patch, that has 2 DocIdSetIterator implementations, one with 
TermDocs, one without. The TermDocs one is for numeric types only choosen, if 
the reader contains deletions *and* 0 is inside the range. For all other cases 
(also StringIndex) the simple DocIdSetIterator using the counter is used.

For more code-reuse, all range implementations now use the same abstract 
DocIdSet implementation and only override matchDoc(). My tests showed, that use 
of this method does not affect performance (method is inlined), the original 
stringindex impl is as fast as the new one with matchDoc().

This patch also restores the original handling of the return value of 
binarySearch (which can be negative).

Here again the comparison:

*Version with TermDocs:*
loading field cache
time: 6767.23131 ms
Warming searcher...
avg number of terms: 378.75
TRIE: best time=5.232229 ms; worst time=553.791334 ms; avg=250.4418579 ms; 
sum=31996909
FIELDCACHE: best time=212.763912 ms; worst time=357.100414 ms; 
avg=279.75582110000005 ms; sum=31996909

*Version without (because index in testcase has no deletions):*
loading field cache
time: 6463.311678 ms
Warming searcher...
avg number of terms: 378.75
TRIE: best time=4.539963 ms; worst time=581.657446 ms; avg=246.58688465 ms; 
sum=31996909
FIELDCACHE: best time=64.747614 ms; worst time=211.557335 ms; 
avg=139.16517340000001 ms; sum=31996909

(my T60 was not on battery, because of this the measurement with TermDocs and 
FieldCache loading was faster that before). But both tests before and after 
optimization were done with same settings. The randseed was identical (0L)

> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: DisjointMultiFilter.java, FieldCacheRangeFilter.patch, 
> LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, 
> LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461a.patch, 
> LUCENE-1461b.patch, LUCENE-1461c.patch, PerfTest.java, RangeMultiFilter.java, 
> RangeMultiFilter.java, TermMultiFilter.java, TestFieldCacheRangeFilter.patch
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-1461) Cached filter for a single term field

Reply via email to