[ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668523#action_12668523
 ] 

Michael McCandless commented on LUCENE-1476:
--------------------------------------------

Hmm... when I run the TestDeletesDocIdSet, I don't see as much
improvement: trunk gets 10.507 seconds, patch gets 10.158 (~3.3%
faster).  I'm running on OS X 10.5.6, quad core Intel; java version is
"1.6.0_07-b06-153" and I run "java -server -Xbatch -Xmx1024M
-Xms1024M".

But that test is rather synthetic: you create 15,000 docs, then delete
1 in 8, then do a search (for "text") that matches all of the docs.

So I went back to contrib/benchmark...  I created a single-segment
index, with 0%, 1%, 2%, 5%, 10%, 20% and 50% deletions, using first 2M
docs from Wikipedia, then ran 5 different queries and compared qps w/
patch vs trunk.  Results:

||%tg deletes||query||hits||qps||qpsnew||%tg change||
|0%|147|   4984|5560.1|5486.2| -1.3%|
|0%|text|  97191| 347.3| 339.4| -2.3%|
|0%|1 AND 2| 234634|  22.9|  22.8| -0.4%|
|0%|1| 386435|  88.4|  87.2| -1.4%|
|0%|1 OR 2| 535584|  20.9|  20.9|  0.0%|
|1%|147|   4933|5082.0|1419.2|-72.1%|
|1%|text|  96143| 313.9| 142.0|-54.8%|
|1%|1 AND 2| 232250|  22.1|  18.6|-15.8%|
|1%|1| 382498|  81.0|  62.2|-23.2%|
|1%|1 OR 2| 530212|  20.2|  17.5|-13.4%|
|2%|147|   4883|5133.6|1959.0|-61.8%|
|2%|text|  95190| 315.8| 149.2|-52.8%|
|2%|1 AND 2| 229870|  22.2|  18.4|-17.1%|
|2%|1| 378641|  81.2|  58.9|-27.5%|
|2%|1 OR 2| 524873|  20.3|  17.1|-15.8%|
|5%|147|   4729|5073.6|2600.8|-48.7%|
|5%|text|  92293| 315.2| 166.9|-47.0%|
|5%|1 AND 2| 222859|  22.5|  17.8|-20.9%|
|5%|1| 367000|  81.0|  56.2|-30.6%|
|5%|1 OR 2| 508632|  20.4|  16.3|-20.1%|
|10%|147|   4475|5049.6|2953.7|-41.5%|
|10%|text|  87504| 314.8| 180.9|-42.5%|
|10%|1 AND 2| 210982|  22.9|  17.8|-22.3%|
|10%|1| 347664|  81.5|  53.8|-34.0%|
|10%|1 OR 2| 481792|  21.2|  16.5|-22.2%|
|20%|147|   4012|5045.0|3455.5|-31.5%|
|20%|text|  77980| 317.2| 204.7|-35.5%|
|20%|1 AND 2| 187605|  23.9|  19.2|-19.7%|
|20%|1| 309040|  82.0|  54.7|-33.3%|
|20%|1 OR 2| 428232|  22.3|  17.5|-21.5%|
|50%|147|   2463|5283.2|4731.4|-10.4%|
|50%|text|  48331| 336.9| 290.2|-13.9%|
|50%|1 AND 2| 116887|  28.4|  25.9| -8.8%|
|50%|1| 193154|  86.4|  74.9|-13.3%|
|50%|1 OR 2| 267525|  27.6|  24.9| -9.8%|

I think one major source of slowness is the BitVector.nextSetBit()
impl: it now checks one bit at a time, but it'd be much better to use
OpenBitSet's approach.

I also think, realistically, this approach won't perform very well
until we switch to a sparse representation for the bit set, so that
next() and skipTo() perform well.

> BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, 
> LUCENE-1476.patch, LUCENE-1476.patch, quasi_iterator_deletions.diff, 
> quasi_iterator_deletions_r2.diff, searchdeletes.alg, TestDeletesDocIdSet.java
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Update BitVector to implement DocIdSet.  Expose deleted docs DocIdSet from 
> IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to