[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668523#action_12668523 ]
Michael McCandless commented on LUCENE-1476: -------------------------------------------- Hmm... when I run the TestDeletesDocIdSet, I don't see as much improvement: trunk gets 10.507 seconds, patch gets 10.158 (~3.3% faster). I'm running on OS X 10.5.6, quad core Intel; java version is "1.6.0_07-b06-153" and I run "java -server -Xbatch -Xmx1024M -Xms1024M". But that test is rather synthetic: you create 15,000 docs, then delete 1 in 8, then do a search (for "text") that matches all of the docs. So I went back to contrib/benchmark... I created a single-segment index, with 0%, 1%, 2%, 5%, 10%, 20% and 50% deletions, using first 2M docs from Wikipedia, then ran 5 different queries and compared qps w/ patch vs trunk. Results: ||%tg deletes||query||hits||qps||qpsnew||%tg change|| |0%|147| 4984|5560.1|5486.2| -1.3%| |0%|text| 97191| 347.3| 339.4| -2.3%| |0%|1 AND 2| 234634| 22.9| 22.8| -0.4%| |0%|1| 386435| 88.4| 87.2| -1.4%| |0%|1 OR 2| 535584| 20.9| 20.9| 0.0%| |1%|147| 4933|5082.0|1419.2|-72.1%| |1%|text| 96143| 313.9| 142.0|-54.8%| |1%|1 AND 2| 232250| 22.1| 18.6|-15.8%| |1%|1| 382498| 81.0| 62.2|-23.2%| |1%|1 OR 2| 530212| 20.2| 17.5|-13.4%| |2%|147| 4883|5133.6|1959.0|-61.8%| |2%|text| 95190| 315.8| 149.2|-52.8%| |2%|1 AND 2| 229870| 22.2| 18.4|-17.1%| |2%|1| 378641| 81.2| 58.9|-27.5%| |2%|1 OR 2| 524873| 20.3| 17.1|-15.8%| |5%|147| 4729|5073.6|2600.8|-48.7%| |5%|text| 92293| 315.2| 166.9|-47.0%| |5%|1 AND 2| 222859| 22.5| 17.8|-20.9%| |5%|1| 367000| 81.0| 56.2|-30.6%| |5%|1 OR 2| 508632| 20.4| 16.3|-20.1%| |10%|147| 4475|5049.6|2953.7|-41.5%| |10%|text| 87504| 314.8| 180.9|-42.5%| |10%|1 AND 2| 210982| 22.9| 17.8|-22.3%| |10%|1| 347664| 81.5| 53.8|-34.0%| |10%|1 OR 2| 481792| 21.2| 16.5|-22.2%| |20%|147| 4012|5045.0|3455.5|-31.5%| |20%|text| 77980| 317.2| 204.7|-35.5%| |20%|1 AND 2| 187605| 23.9| 19.2|-19.7%| |20%|1| 309040| 82.0| 54.7|-33.3%| |20%|1 OR 2| 428232| 22.3| 17.5|-21.5%| |50%|147| 2463|5283.2|4731.4|-10.4%| |50%|text| 48331| 336.9| 290.2|-13.9%| |50%|1 AND 2| 116887| 28.4| 25.9| -8.8%| |50%|1| 193154| 86.4| 74.9|-13.3%| |50%|1 OR 2| 267525| 27.6| 24.9| -9.8%| I think one major source of slowness is the BitVector.nextSetBit() impl: it now checks one bit at a time, but it'd be much better to use OpenBitSet's approach. I also think, realistically, this approach won't perform very well until we switch to a sparse representation for the bit set, so that next() and skipTo() perform well. > BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs > ----------------------------------------------------------------------- > > Key: LUCENE-1476 > URL: https://issues.apache.org/jira/browse/LUCENE-1476 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.4 > Reporter: Jason Rutherglen > Priority: Trivial > Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, > LUCENE-1476.patch, LUCENE-1476.patch, quasi_iterator_deletions.diff, > quasi_iterator_deletions_r2.diff, searchdeletes.alg, TestDeletesDocIdSet.java > > Original Estimate: 12h > Remaining Estimate: 12h > > Update BitVector to implement DocIdSet. Expose deleted docs DocIdSet from > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org