[ 
https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794617#action_12794617
 ] 

John Wang commented on LUCENE-2120:
-----------------------------------

Michael:

    I wrote a little test to measure and to understand the perf:

I compare the two methods:
static long testBits(BitVector bv,int numhits) throws Exception{
        long start = System.nanoTime();
        for (int i=0;i<numhits;++i){
                if (bv.get(i)){
                }
        }

                long end = System.nanoTime();
                return (end-start);
    }
    
        static long testSkipIter(ArrayDocIdSet delSet,int numhits) throws 
Exception{
                DocIdSetIterator delIter = delSet.iterator();
                int nextDelDoc = delIter.nextDoc();
                long start = System.nanoTime();
                for (int i=0;i<numhits;++i){
                        if (i>=nextDelDoc){
                          if (i==nextDelDoc){
                          }
                          nextDelDoc = delIter.advance(i);
                        }
                }
                long end = System.nanoTime();
                return (end-start);
        }


I removed everything to the barebones to understand the perf implications.

Here are the results on my macbook pro, with numHits and del count:

5M 500: 
bits: 42417850
skip: 15234650

5M 100:
bits: 43053650
skip: 15268850

5M 10k:
bits: 41694350
skip: 17504900

5M 100k:
bits: 41737350
skip: 42966000

1M 1000:
bits: 8722700
skip: 3249100

1M 10k:
bits: 8210650
skip: 6119700

1M 25k:
bits: 8558150
skip: 9477850

You see BitVector starts to win with numDeletes's density at about 2%, and it 
is pretty consistent between diff numHits parameter.

So in real life scenario, we see that the numDeletes to be very small. However, 
it would be a great improvement if we can case it out depending on result set.

> Possible file handle leak in near real-time reader
> --------------------------------------------------
>
>                 Key: LUCENE-2120
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2120
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>
> Spinoff of LUCENE-1526: Jake/John hit file descriptor exhaustion when testing 
> NRT.
> I've tried to repro this, stress testing NRT, saturating reopens, indexing, 
> searching, but haven't found any issue.
> Let's try to get to the bottom of it, here...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to