Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

Michael McCandless Tue, 03 Feb 2009 16:15:35 -0800


eks dev wrote:

I agree we should warn about this in the javadocs... can you workup a patch?
I'll give it a try, no promise when, changing job, moving...
I plan to run some tests to figure out the performance tradeoffshere.
We switched to iterator access for a toplevel filter, as ofLUCENE-584, but fromLUCENE-1476 it's looking like except for fairly sparse filters,random access is
much faster.
have a look at  https://issues.apache.org/jira/browse/LUCENE-1436
this should be important for deletions case

OK.

I'll just keep dumping my thinking about it, maybe somethingmeaningful comes out, unfortunately not enough time to think deeperor try it now..


No problem; keep it coming!

as long as we look at single term queries, high deletion densitycases should be faster with random access (or anything else) atTermDocs level because we will be just propagating decision higherup instead of "killing" document at TermDocs level. Cases with moreterms, disjunctions, are getting interesting, starting to feel speed-up proportional to the number of intersectiong documents.
for Query (A OR B) we need to check if(deleted) condition #A + #Btimes if we do it at TermDocs level, in filter case we need to do itonly
#(A\B) + #(B\A) + #(A AND B) and this number is smaller or equal(worst case) than #A + #B
this is exactly the case that makes performance headaches.

Right! This is why I'm going to test the full matrix (differentqueries X different sparseness filters).

We have two competing issues, constant time factor on skipTo() vsget() and algorithmic enhancement due to saved checks. Balancedepends on Query and skipTo()/get() performance diff.
Maybe thinking along the "Filter with both options" lines, random(optional support) and iterator?

Right: I'm guessing we eventually need a simplistic query optimizerthat would choose how to apply a top-level AND'd filtered (or AND NOT,eg for deletions). If the filter is very sparse, it's probably bestto use iterator especially if filter is already iterator-friendly, egSortedVIntList.

At the end of a day, Filter works at API level with DocIdSet, notDocIdSetIterator.... that would remove constant factor, the questionis this possible to add optional DocIdSet.get(int ) on current APIand use it for some specialized cases like this one.

Either add that random-access API, or make a RandomAccessDocIdSet,or... something. Not sure yet. Ideally BooleanQuery (which we shouldconsolidate deletions & top-level filters under) should somehow askthe filter if it's random access, and then drive the matchingaccordingly.

also, math for conjunctions looks much better in filters


I'll test conjunctions too.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Created: (LUCENE-1533) Deleted documents as a Filter or top level Query

Reply via email to