What about doing a read-backwards as well? I.e., if read ahead fails, try to
read backwards --> we must be able to read any segment, no?

Shai

On Thu, Aug 13, 2009 at 9:29 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> This is spooky -- it looks like SMB2 (which was introduced with Windows
> Vista & Windows Server 2008) now does "aggressive" client-side
> caching, such that the cache can be wrong about the current state of
> the directory.
>
> At least it sort of sounds like Microsoft considers it a real issue:
>
> > Yes, this is a known product issue of SMB2.
> >
> > SMB2 does implicit attribute and directory metadata caching at all
> > times, whereas SMB1 was much stricter about when it would do so. The
> > caches are consistent when changes are made by the client, but if
> > changes are made from another client they may not be reflected until
> > the cache times out.
>
> This will definitely cause problems (like the exception you're
> hitting) for Lucene.  It's exactly the same problems we had with NFS,
> but the readahead in SegmentInfos.FindSegmentsFile worked around that.
> It sounds like for SMB2 that readahead is not working, presumably
> because (unlike NFS) the client does not check back w/ the server if
> it believes (based on its stale cache) that the file does not exist.  Sigh.
>
> SMB1 did not have this problem, in my experience.
>
> I wonder if, from javaland, we have some way to force the cache to
> become coherent.
>
> One simple workaround at the app level is to simply retry on hitting
> an errant "segments_N file not found" exception.
>
> Mike
>
> On Thu, Aug 13, 2009 at 8:50 AM, Shai Erera<ser...@gmail.com> wrote:
> > Hi
> >
> > Has anyone experienced any problems w/ Lucene indexes on a shared SMB2
> > network drive?
> >
> > We've hit a scenario where it seems the FS cache refuses to check for
> > existence of files on the shared network drive. Specifically, we hit the
> > following exception:
> >
> > java.io.FileNotFoundException: Z:\index\segments_p8 (The system cannot
> find
> > the file specified.)
> > at java.io.RandomAccessFile.open(Native Method)
> > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
> > at
> >
> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:552)
> > at
> >
> org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:582)
> > at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488)
> > at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:482)
> > at org.apache.lucene.index.SegmentInfos$2.doBody(SegmentInfos.java:369)
> > at
> >
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653)
> > at
> >
> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366)
> > at
> >
> org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188)
> > at org.apache.lucene.index.MultiReader.isCurrent(MultiReader.java:352)
> >
> > The environment:
> > * 3 Windows Server 2008 machines
> > ** Machine A - hosts the index
> > ** Machine B - indexes and search
> > ** Machine C - just search
> > * Machine A and B map Machine C on drive Z.
> > * The exception happens on Machine C only, i.e. on the machine that does
> > just 'search'.
> >
> > According to my understanding, FindSegmentFile attempts to read the
> latest
> > segment from segments.gen and directory listing and if there is a
> problem,
> > it will do a gen-readahead until success or defaultGenLookaheadCount is
> > exhausted.
> >
> > So by hitting this exception we thought of the following explanation: the
> FS
> > cache 'decides' the file does not exist, due to a stale directory cache,
> and
> > refuses to check whether the file actually exists on the remote machine.
> >
> > Does that sound reasonable?
> >
> > Some more information:
> > * We use Lucene 2.4.0
> > * Other runs are executed on those machines currently, and so it will
> take
> > about a week until we can run the same scenario again. I thought that
> > perhaps we can discuss this until then.
> > * Unfortunately we weren't able to get an infoStream output before the
> > machines started another run, so we hope to get it next time. Anyway,
> it's
> > not easily reproduced.
> > * There isn't any other process which touches this directory, such that
> it
> > may remove index files.
> >
> > We know the same code runs well on NFS (4). We haven't checked yet if SMB
> > 1.0 works ok. Some pointers we've found:
> >
> > A known issue on MS, w/ some C++ fixes:
> >
> http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.win32.programmer.networks&tid=69e63e38-7d91-4306-ab6e-a615e1c6afaa&cat=en_US_bc89adf4-f184-4d3d-aaee-122567385744&lang=en&cr=US&sloc=&p=1
> >
> > Info on how to disable SMB 2.0 on Windows:
> >
> http://www.petri.co.il/how-to-disable-smb-2-on-windows-vista-or-server-2008.htm
> >
> > Currently, we think to bypass the problem by wrapping calls to isCurrent
> and
> > reopen w/ a try-catch FileNotFoundException and use the reader we have at
> > hand. Later, we will attempt the isCurrent again. Since SMB caching seems
> to
> > be time-controlled, we expect the cache to be refreshed after several
> > seconds, and those calls will succeed.
> > I wonder though if this can't get us into hitting the exception
> 'forever'.
> > E.g., imagine a system which indexes at very high rates. Isn't it
> possible
> > that we'll hit this exception every time we call isCurrent?
> >
> > I'm not sure if there is anything we can do in Lucene, besides sleeping
> in
> > FindSegmentsFile for several seconds which is not reasonable.
> > Maybe a way out would be, I think, having FindSegmentsFile try to read
> ahead
> > and then backwards. At some point, we ought to find a segment that's
> > readable, even if an old one, no?
> >
> > Any help will be appreciated.
> >
> > Shai
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Reply via email to