[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Michael McCandless (JIRA) Fri, 26 Jun 2009 08:25:32 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724573#action_12724573
 ]


Michael McCandless commented on LUCENE-1461:
--------------------------------------------

bq. This problem has also NumericRangeQuery (see the TermEnum impl there). I 
could change both queries to simply return the empty iterator (like when 
upper<lower)

Right, and I see you've already fixed it!

>From your performance runs, looking at the average times, forcing this
filter to take deletions into account made it ~2X slower.  That's
quite costly.

(Though, you really should seed the Random() so the two tests run
precisely the same set of queries against precisely the same index).

I would imagine that for most usage of this filter, taking deletes
into account is not necessary, because it's being used as a filter
with a query whose scorer won't return deleted docs.  Then we've taken
this perf hit for nothing...

Somehow, we really need better control, when creating scorers, on just
when we need and don't need deletions / filters to be "AND'd" in.

Also, this filter isn't good when not many docs pass the filter, since
it's an O(N) scan through the index.  Trie should do much better in
those cases.

I wonder, if we could make a hybrid approach that eg loads the trie
fields into a fast in-memory postings format (simple int arrays), just
how much faster it'd be.  Ie, if you want to spend memory, spending it
on trie's postings would presumably net the best performance.  Once we
have flexible indexing we could presumably "swap in" an in-RAM
postings impl and then run trie against that.


> Cached filter for a single term field
> -------------------------------------
>
>                 Key: LUCENE-1461
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1461
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Tim Sturge
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: DisjointMultiFilter.java, FieldCacheRangeFilter.patch, 
> LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, 
> LUCENE-1461.patch, LUCENE-1461a.patch, LUCENE-1461b.patch, 
> LUCENE-1461c.patch, PerfTest.java, RangeMultiFilter.java, 
> RangeMultiFilter.java, TermMultiFilter.java, TestFieldCacheRangeFilter.patch
>
>
> These classes implement inexpensive range filtering over a field containing a 
> single term. They do this by building an integer array of term numbers 
> (storing the term->number mapping in a TreeMap) and then implementing a fast 
> integer comparison based DocSetIdIterator.
> This code is currently being used to do age range filtering, but could also 
> be used to do other date filtering or in any application where there need to 
> be multiple filters based on the same single term field. I have an untested 
> implementation of single term filtering and have considered but not yet 
> implemented term set filtering (useful for location based searches) as well. 
> The code here is fairly rough; it works but lacks javadocs and toString() and 
> hashCode() methods etc. I'm posting it here to discover if there is other 
> interest in this feature; I don't mind fixing it up but would hate to go to 
> the effort if it's not going to make it into Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1461) Cached filter for a single term field

Reply via email to