Also Mike - even if the writer has committed, and then I notify the other nodes they should refresh, it's still possible for them to hit this exception, right?
On Fri, Aug 14, 2009 at 1:02 AM, Shai Erera <ser...@gmail.com> wrote: > How can the writer delete all previous segments? If I have a reader open, > doesn't it prevent those files to be deleted? That's why I count on any of > those files to exist. Perhaps I'm wrong though. > > I think we can come up w/ some notification mechanism, through MQ or > something. > > Do you think it's worth to be documented on the Wiki? The entry about FNFE > during searches mentions NFS or SMB, but does not mention > SimpleFSLockFactory (Which solves a different problem). Maybe we can add > that info there? > > Shai > > > On Fri, Aug 14, 2009 at 12:50 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> On Thu, Aug 13, 2009 at 5:33 PM, Shai Erera<ser...@gmail.com> wrote: >> >> > So if afterwards we read until segment_17 and exhaust read-ahead, and we >> > determine that there's a problem - we throw the exception. If instead >> we'll >> > try to read backwards, I'm sure one of the segments will be read >> > successfully, because that reader must already see any segment, right? >> >> I don't think you're guaranteed to read successfully, on reading >> backwards. >> >> Ie, say writer has committed segments_8, and therefore just removed >> segments_7. >> >> When the reader (on a different machine, w/ stale cache) tries to >> open, it's cache claims segments_7 still exists, so we try to open >> that but fail. We advance to segments_8 and try to open that, but >> fail (presumably because local SMB2 cache doesn't consult the server, >> unlike many NFS clients, I think). We then try up through segments_17 >> and nothing works. But going backwards can't work either because >> those segments files have all been deleted. (Assuming >> KeepOnlyLastCommitDeletionPolicy... things do get more interesting if >> you're using a different deletion policy...). >> >> Sadly, the most common approach to refreshing readers, eg checking >> every N seconds if it's time to reopen, leads directly to this "cache >> is holding onto stale data". My guess is if an app only attempted to >> reopen the reader after the writer on another machine had committed, >> then this exception wouldn't happen. But that'd require some >> notification mechanism outside of Lucene. >> >> Mike >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >