Maybe I am not understanding the patch. But isn't casting from Filter.getDocIdSet to OpenBitSet kinda dangerous and assuming Filter constructing a Bitset something we want to move away from?
-John On Mon, Apr 20, 2009 at 4:27 PM, Jason Rutherglen (JIRA) <j...@apache.org>wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700984#action_12700984] > > Jason Rutherglen edited comment on LUCENE-1536 at 4/20/09 4:26 PM: > ------------------------------------------------------------------- > > Perhaps we can go ahead with this patch given we're not sure how > to do an optimized version of LUCENE-1518 yet. This patch > entails passing the RandomAccessFilter to TermScorer, what's a > good way to do this without rewriting too much of the Lucene API? > > * TermQuery.createWeight -> TermWeight.scorer instantiates the > TermScorer which is where we need to pass in the filter? So we > could somehow pass the filter in via multiple constructors? I > didn't see a clean API way though. > > * Or we can add a new method to Scorer, something like > getSequentialSubScorers? Which we then iterate over and if one > is a TermScorer set the filter(s). This setting of the RAF would > happen in IndexSearcher.doSearch. > > was (Author: jasonrutherglen): > Perhaps we can go ahead with this patch given we're not sure how > to do an optimized version of LUCENE-1345 yet. This patch > entails passing the RandomAccessFilter to TermScorer, what's a > good way to do this without rewriting too much of the Lucene API? > > * TermQuery.createWeight -> TermWeight.scorer instantiates the > TermScorer which is where we need to pass in the filter? So we > could somehow pass the filter in via multiple constructors? I > didn't see a clean API way though. > > * Or we can add a new method to Scorer, something like > getSequentialSubScorers? Which we then iterate over and if one > is a TermScorer set the filter(s). This setting of the RAF would > happen in IndexSearcher.doSearch. > > > if a filter can support random access API, we should use it > > ----------------------------------------------------------- > > > > Key: LUCENE-1536 > > URL: https://issues.apache.org/jira/browse/LUCENE-1536 > > Project: Lucene - Java > > Issue Type: Improvement > > Components: Search > > Affects Versions: 2.4 > > Reporter: Michael McCandless > > Assignee: Michael McCandless > > Priority: Minor > > Fix For: 2.9 > > > > Attachments: LUCENE-1536.patch > > > > > > I ran some performance tests, comparing applying a filter via > > random-access API instead of current trunk's iterator API. > > This was inspired by LUCENE-1476, where we realized deletions should > > really be implemented just like a filter, but then in testing found > > that switching deletions to iterator was a very sizable performance > > hit. > > Some notes on the test: > > * Index is first 2M docs of Wikipedia. Test machine is Mac OS X > > 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. > > * I test across multiple queries. 1-X means an OR query, eg 1-4 > > means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 > > AND 3 AND 4. "u s" means "united states" (phrase search). > > * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, > > 95, 98, 99, 99.99999 (filter is non-null but all bits are set), > > 100 (filter=null, control)). > > * Method high means I use random-access filter API in > > IndexSearcher's main loop. Method low means I use random-access > > filter API down in SegmentTermDocs (just like deleted docs > > today). > > * Baseline (QPS) is current trunk, where filter is applied as iterator > up > > "high" (ie in IndexSearcher's search loop). > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >