[
https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790673#action_12790673
]
Michael McCandless commented on LUCENE-2120:
--------------------------------------------
Thanks for the details John!
{quote}
bq: Why does Zoie even retain 3 readers? Why not keep only the current one?
1 mem reader for when the disk batch, 1 mem reader for the time disk reader
indexes, 1 disk reader
{quote}
Hmm -- is this what the {{private static int MAX_READER_GENERATION = 3;}} in
ThrottledLuceneNRTDataConsumer (link above) is doing? From that code, it looks
like it just always retains the last 3 reopened readers... it sounds like the
logic to keep the 2 mem readers & 1 disk reader is elsewhere (in addition) in
Zoie? Or maybe this logic is what's doing that? (Not yet familiar enough w/
Zoie...).
bq. By default, it only runs with Medline data. You don't need both.
perf/settings/index.properties->data.type dictates which to use, file->medline,
wiki->wikipedia
OK I'll try to use only Wikipedia -- I've already slurped that down. It'd be
great to get a ContentSource in contrib/benchmark that can produce docs from
medline...
{quote}
You should use the branch: BR_DELETE_OPT
It has the optimization you suggested on handling deleted docs, e.g. should not
check for each hit candidate with IntSetAccelerator.
Also, I have added a DataConsumer to handle delayed reopen for NRT case. You
see the file handle leakage quickly with it: see perf/conf/zoie.properties to
turn on ThrottledLuceneNRTDataConsumer.
On my mac, I use lsof to see the file handle count.
{quote}
OK, will do. So (on this branch) you resolve the deleted docs in the BG, so
that, once finished, the reader no longer double-checks hits for deletions?
That sounds like a good improvement...
It's sort of a "warm in the background" tradeoff, ie, give me my reader very
quickly, even if the first searches against it must run a bit slower since they
double check deletions, until the warming is done.... vs Lucene which
forcefully "warms" (making reopen time longer) before returning the reader to
you.
> Possible file handle leak in near real-time reader
> --------------------------------------------------
>
> Key: LUCENE-2120
> URL: https://issues.apache.org/jira/browse/LUCENE-2120
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
> Affects Versions: 3.1
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 3.1
>
>
> Spinoff of LUCENE-1526: Jake/John hit file descriptor exhaustion when testing
> NRT.
> I've tried to repro this, stress testing NRT, saturating reopens, indexing,
> searching, but haven't found any issue.
> Let's try to get to the bottom of it, here...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]