[
https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794617#action_12794617
]
John Wang commented on LUCENE-2120:
-----------------------------------
Michael:
I wrote a little test to measure and to understand the perf:
I compare the two methods:
static long testBits(BitVector bv,int numhits) throws Exception{
long start = System.nanoTime();
for (int i=0;i<numhits;++i){
if (bv.get(i)){
}
}
long end = System.nanoTime();
return (end-start);
}
static long testSkipIter(ArrayDocIdSet delSet,int numhits) throws
Exception{
DocIdSetIterator delIter = delSet.iterator();
int nextDelDoc = delIter.nextDoc();
long start = System.nanoTime();
for (int i=0;i<numhits;++i){
if (i>=nextDelDoc){
if (i==nextDelDoc){
}
nextDelDoc = delIter.advance(i);
}
}
long end = System.nanoTime();
return (end-start);
}
I removed everything to the barebones to understand the perf implications.
Here are the results on my macbook pro, with numHits and del count:
5M 500:
bits: 42417850
skip: 15234650
5M 100:
bits: 43053650
skip: 15268850
5M 10k:
bits: 41694350
skip: 17504900
5M 100k:
bits: 41737350
skip: 42966000
1M 1000:
bits: 8722700
skip: 3249100
1M 10k:
bits: 8210650
skip: 6119700
1M 25k:
bits: 8558150
skip: 9477850
You see BitVector starts to win with numDeletes's density at about 2%, and it
is pretty consistent between diff numHits parameter.
So in real life scenario, we see that the numDeletes to be very small. However,
it would be a great improvement if we can case it out depending on result set.
> Possible file handle leak in near real-time reader
> --------------------------------------------------
>
> Key: LUCENE-2120
> URL: https://issues.apache.org/jira/browse/LUCENE-2120
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 3.1
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 3.1
>
>
> Spinoff of LUCENE-1526: Jake/John hit file descriptor exhaustion when testing
> NRT.
> I've tried to repro this, stress testing NRT, saturating reopens, indexing,
> searching, but haven't found any issue.
> Let's try to get to the bottom of it, here...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]