[ https://issues.apache.org/jira/browse/LUCENE-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724573#action_12724573 ]
Michael McCandless commented on LUCENE-1461: -------------------------------------------- bq. This problem has also NumericRangeQuery (see the TermEnum impl there). I could change both queries to simply return the empty iterator (like when upper<lower) Right, and I see you've already fixed it! >From your performance runs, looking at the average times, forcing this filter to take deletions into account made it ~2X slower. That's quite costly. (Though, you really should seed the Random() so the two tests run precisely the same set of queries against precisely the same index). I would imagine that for most usage of this filter, taking deletes into account is not necessary, because it's being used as a filter with a query whose scorer won't return deleted docs. Then we've taken this perf hit for nothing... Somehow, we really need better control, when creating scorers, on just when we need and don't need deletions / filters to be "AND'd" in. Also, this filter isn't good when not many docs pass the filter, since it's an O(N) scan through the index. Trie should do much better in those cases. I wonder, if we could make a hybrid approach that eg loads the trie fields into a fast in-memory postings format (simple int arrays), just how much faster it'd be. Ie, if you want to spend memory, spending it on trie's postings would presumably net the best performance. Once we have flexible indexing we could presumably "swap in" an in-RAM postings impl and then run trie against that. > Cached filter for a single term field > ------------------------------------- > > Key: LUCENE-1461 > URL: https://issues.apache.org/jira/browse/LUCENE-1461 > Project: Lucene - Java > Issue Type: New Feature > Reporter: Tim Sturge > Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: DisjointMultiFilter.java, FieldCacheRangeFilter.patch, > LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, LUCENE-1461.patch, > LUCENE-1461.patch, LUCENE-1461a.patch, LUCENE-1461b.patch, > LUCENE-1461c.patch, PerfTest.java, RangeMultiFilter.java, > RangeMultiFilter.java, TermMultiFilter.java, TestFieldCacheRangeFilter.patch > > > These classes implement inexpensive range filtering over a field containing a > single term. They do this by building an integer array of term numbers > (storing the term->number mapping in a TreeMap) and then implementing a fast > integer comparison based DocSetIdIterator. > This code is currently being used to do age range filtering, but could also > be used to do other date filtering or in any application where there need to > be multiple filters based on the same single term field. I have an untested > implementation of single term filtering and have considered but not yet > implemented term set filtering (useful for location based searches) as well. > The code here is fairly rough; it works but lacks javadocs and toString() and > hashCode() methods etc. I'm posting it here to discover if there is other > interest in this feature; I don't mind fixing it up but would hate to go to > the effort if it's not going to make it into Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org