[ 
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869562#action_12869562
 ] 

Michael McCandless commented on LUCENE-2468:
--------------------------------------------

{quote}
bq. So... why not do this in CachingWrapper/SpanFilter, but, instead of 
discarding the cache entry when deletions must be enforced, we dynamically 
apply the deletions? (I think we could use FilteredDocIdSet).

Yea, that would work well. You will need to somehow still know when to enable 
or disable this based on the filter you use (it should basically only be 
enabled ones that are passed to constant score... {quote}

OK I'll take that approach on next iter.

But: I think this may need to be enabled in other cases where the
filter is used (ie not only CSQ).  Sure, CSQ is the one example we
have today, where if you pass a Filter that ignores "recent" deletions
you'll be in trouble... but who knows what other uses of a Filter
might trip up on this intentional cache-incoherence we're introducing.

bq. Agreed. As I see it, caching based on IndexReader is key in Lucene, and 
with NRT, it should feel the same way as it is without it. NRT should not 
change the way you build your system.

Well... NRT and up-to-date deletions will always present a challenge.

Really, this tradeoff we are making here, where a cached filter can be
set to either 1) ignore new deletions, 2) discard its cache entry and
fully regenerate itself, or 3) dynamically intersect the deletions, is
similar to the discussions we've had about just how an NRT segment
reader should enforce recent deletions.

Ie, ignoring option 1 (which of course gives the best perf), option 2,
while making a reopen more costly, gets you the best search
performance (since only one bit set is checked during searches).

Option 3 makes reopens much faster, but then search peformance takes a
hit (since you're checking 2 bit sets).

Option 2 is analogous to how Lucene now handles the per-segment
deleted docs bit vector (it's fully recreated on each reopen), while
option 3 is analogous to how Zoie handles deletions (new deletions are
dynamically applied to all search hits).


> reopen on NRT reader should share readers w/ unchanged segments
> ---------------------------------------------------------------
>
>                 Key: LUCENE-2468
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2468
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Michael McCandless
>         Attachments: CacheTest.java, DeletionAwareConstantScoreQuery.java, 
> LUCENE-2468.patch, LUCENE-2468.patch
>
>
> A repoen on an NRT reader doesn't seem to share readers for those segments 
> that are unchanged.
> http://search.lucidimagination.com/search/document/9f0335d480d2e637/nrt_and_caching_based_on_indexreader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to