[ 
https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790673#action_12790673
 ] 

Michael McCandless commented on LUCENE-2120:
--------------------------------------------

Thanks for the details John!

{quote}
bq: Why does Zoie even retain 3 readers? Why not keep only the current one?

1 mem reader for when the disk batch, 1 mem reader for the time disk reader 
indexes, 1 disk reader
{quote}

Hmm -- is this what the {{private static int MAX_READER_GENERATION = 3;}} in 
ThrottledLuceneNRTDataConsumer (link above) is doing? From that code, it looks 
like it just always retains the last 3 reopened readers... it sounds like the 
logic to keep the 2 mem readers & 1 disk reader is elsewhere (in addition) in 
Zoie?  Or maybe this logic is what's doing that?  (Not yet familiar enough w/ 
Zoie...).

bq. By default, it only runs with Medline data. You don't need both. 
perf/settings/index.properties->data.type dictates which to use, file->medline, 
wiki->wikipedia

OK I'll try to use only Wikipedia -- I've already slurped that down.  It'd be 
great to get a ContentSource in contrib/benchmark that can produce docs from 
medline...

{quote}
You should use the branch: BR_DELETE_OPT

It has the optimization you suggested on handling deleted docs, e.g. should not 
check for each hit candidate with IntSetAccelerator.
Also, I have added a DataConsumer to handle delayed reopen for NRT case. You 
see the file handle leakage quickly with it: see perf/conf/zoie.properties to 
turn on ThrottledLuceneNRTDataConsumer.

On my mac, I use lsof to see the file handle count.
{quote}

OK, will do.  So (on this branch) you resolve the deleted docs in the BG, so 
that, once finished, the reader no longer double-checks hits for deletions?  
That sounds like a good improvement...    

It's sort of a "warm in the background" tradeoff, ie, give me my reader very 
quickly, even if the first searches against it must run a bit slower since they 
double check deletions, until the warming is done.... vs Lucene which 
forcefully "warms" (making reopen time longer) before returning the reader to 
you.

> Possible file handle leak in near real-time reader
> --------------------------------------------------
>
>                 Key: LUCENE-2120
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2120
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.1
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>
> Spinoff of LUCENE-1526: Jake/John hit file descriptor exhaustion when testing 
> NRT.
> I've tried to repro this, stress testing NRT, saturating reopens, indexing, 
> searching, but haven't found any issue.
> Let's try to get to the bottom of it, here...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to