Also Mike - even if the writer has committed, and then I notify the other
nodes they should refresh, it's still possible for them to hit this
exception, right?

On Fri, Aug 14, 2009 at 1:02 AM, Shai Erera <ser...@gmail.com> wrote:

> How can the writer delete all previous segments? If I have a reader open,
> doesn't it prevent those files to be deleted? That's why I count on any of
> those files to exist. Perhaps I'm wrong though.
>
> I think we can come up w/ some notification mechanism, through MQ or
> something.
>
> Do you think it's worth to be documented on the Wiki? The entry about FNFE
> during searches mentions NFS or SMB, but does not mention
> SimpleFSLockFactory (Which solves a different problem). Maybe we can add
> that info there?
>
> Shai
>
>
> On Fri, Aug 14, 2009 at 12:50 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> On Thu, Aug 13, 2009 at 5:33 PM, Shai Erera<ser...@gmail.com> wrote:
>>
>> > So if afterwards we read until segment_17 and exhaust read-ahead, and we
>> > determine that there's a problem - we throw the exception. If instead
>> we'll
>> > try to read backwards, I'm sure one of the segments will be read
>> > successfully, because that reader must already see any segment, right?
>>
>> I don't think you're guaranteed to read successfully, on reading
>> backwards.
>>
>> Ie, say writer has committed segments_8, and therefore just removed
>> segments_7.
>>
>> When the reader (on a different machine, w/ stale cache) tries to
>> open, it's cache claims segments_7 still exists, so we try to open
>> that but fail.  We advance to segments_8 and try to open that, but
>> fail (presumably because local SMB2 cache doesn't consult the server,
>> unlike many NFS clients, I think).  We then try up through segments_17
>> and nothing works.  But going backwards can't work either because
>> those segments files have all been deleted.  (Assuming
>> KeepOnlyLastCommitDeletionPolicy... things do get more interesting if
>> you're using a different deletion policy...).
>>
>> Sadly, the most common approach to refreshing readers, eg checking
>> every N seconds if it's time to reopen, leads directly to this "cache
>> is holding onto stale data".  My guess is if an app only attempted to
>> reopen the reader after the writer on another machine had committed,
>> then this exception wouldn't happen.  But that'd require some
>> notification mechanism outside of Lucene.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>

Reply via email to