[jira] Updated: (LUCENE-1461) Cached filter for a single term field

Uwe Schindler (JIRA) Fri, 26 Jun 2009 02:29:46 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated LUCENE-1461:
----------------------------------

    Attachment: LUCENE-1461.patch

Hey Mike, same time... :-)

I did some recherche and also found out, that a filter's DocIdSet should not 
list deleted documents.

Because of that, I changed the non-StringIndex (which will never contain 
strings of deleted docs because it has a order[]->0 mapping) to use 
IndexReader.termDocs(null) to lists the docIds (which is no real problem, as it 
is just an iterator an a bitset, the additional cost is low, tested with 10 Mio 
index).

I also created a superclass for all the iterators working on numbers, to get 
the termDocs handled easily. The type-specific iterators ony override a 
matchDoc() method. StringIndex iterator stays separate, because it is optimized 
and has no deleted docs problem as described before.

This patch also contains tests for all (except byte) types.

I will commit in a day or two.

(an other solution for future would be to have an additional bitset for numeric 
values in addition to the native type array (in FieldCache), that holds the 
information, if the document had a term available. This would also cover the 
deleted docs)

> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: DisjointMultiFilter.java, FieldCacheRangeFilter.patch, 
> LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, 
> LUCENE-1461.patch, LUCENE-1461a.patch, LUCENE-1461b.patch, 
> LUCENE-1461c.patch, RangeMultiFilter.java, RangeMultiFilter.java, 
> TermMultiFilter.java, TestFieldCacheRangeFilter.patch
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-1461) Cached filter for a single term field

Reply via email to