What about doing a read-backwards as well? I.e., if read ahead fails, try to read backwards --> we must be able to read any segment, no?
Shai On Thu, Aug 13, 2009 at 9:29 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > This is spooky -- it looks like SMB2 (which was introduced with Windows > Vista & Windows Server 2008) now does "aggressive" client-side > caching, such that the cache can be wrong about the current state of > the directory. > > At least it sort of sounds like Microsoft considers it a real issue: > > > Yes, this is a known product issue of SMB2. > > > > SMB2 does implicit attribute and directory metadata caching at all > > times, whereas SMB1 was much stricter about when it would do so. The > > caches are consistent when changes are made by the client, but if > > changes are made from another client they may not be reflected until > > the cache times out. > > This will definitely cause problems (like the exception you're > hitting) for Lucene. It's exactly the same problems we had with NFS, > but the readahead in SegmentInfos.FindSegmentsFile worked around that. > It sounds like for SMB2 that readahead is not working, presumably > because (unlike NFS) the client does not check back w/ the server if > it believes (based on its stale cache) that the file does not exist. Sigh. > > SMB1 did not have this problem, in my experience. > > I wonder if, from javaland, we have some way to force the cache to > become coherent. > > One simple workaround at the app level is to simply retry on hitting > an errant "segments_N file not found" exception. > > Mike > > On Thu, Aug 13, 2009 at 8:50 AM, Shai Erera<ser...@gmail.com> wrote: > > Hi > > > > Has anyone experienced any problems w/ Lucene indexes on a shared SMB2 > > network drive? > > > > We've hit a scenario where it seems the FS cache refuses to check for > > existence of files on the shared network drive. Specifically, we hit the > > following exception: > > > > java.io.FileNotFoundException: Z:\index\segments_p8 (The system cannot > find > > the file specified.) > > at java.io.RandomAccessFile.open(Native Method) > > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) > > at > > > org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:552) > > at > > > org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:582) > > at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488) > > at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:482) > > at org.apache.lucene.index.SegmentInfos$2.doBody(SegmentInfos.java:369) > > at > > > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653) > > at > > > org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:366) > > at > > > org.apache.lucene.index.DirectoryIndexReader.isCurrent(DirectoryIndexReader.java:188) > > at org.apache.lucene.index.MultiReader.isCurrent(MultiReader.java:352) > > > > The environment: > > * 3 Windows Server 2008 machines > > ** Machine A - hosts the index > > ** Machine B - indexes and search > > ** Machine C - just search > > * Machine A and B map Machine C on drive Z. > > * The exception happens on Machine C only, i.e. on the machine that does > > just 'search'. > > > > According to my understanding, FindSegmentFile attempts to read the > latest > > segment from segments.gen and directory listing and if there is a > problem, > > it will do a gen-readahead until success or defaultGenLookaheadCount is > > exhausted. > > > > So by hitting this exception we thought of the following explanation: the > FS > > cache 'decides' the file does not exist, due to a stale directory cache, > and > > refuses to check whether the file actually exists on the remote machine. > > > > Does that sound reasonable? > > > > Some more information: > > * We use Lucene 2.4.0 > > * Other runs are executed on those machines currently, and so it will > take > > about a week until we can run the same scenario again. I thought that > > perhaps we can discuss this until then. > > * Unfortunately we weren't able to get an infoStream output before the > > machines started another run, so we hope to get it next time. Anyway, > it's > > not easily reproduced. > > * There isn't any other process which touches this directory, such that > it > > may remove index files. > > > > We know the same code runs well on NFS (4). We haven't checked yet if SMB > > 1.0 works ok. Some pointers we've found: > > > > A known issue on MS, w/ some C++ fixes: > > > http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.win32.programmer.networks&tid=69e63e38-7d91-4306-ab6e-a615e1c6afaa&cat=en_US_bc89adf4-f184-4d3d-aaee-122567385744&lang=en&cr=US&sloc=&p=1 > > > > Info on how to disable SMB 2.0 on Windows: > > > http://www.petri.co.il/how-to-disable-smb-2-on-windows-vista-or-server-2008.htm > > > > Currently, we think to bypass the problem by wrapping calls to isCurrent > and > > reopen w/ a try-catch FileNotFoundException and use the reader we have at > > hand. Later, we will attempt the isCurrent again. Since SMB caching seems > to > > be time-controlled, we expect the cache to be refreshed after several > > seconds, and those calls will succeed. > > I wonder though if this can't get us into hitting the exception > 'forever'. > > E.g., imagine a system which indexes at very high rates. Isn't it > possible > > that we'll hit this exception every time we call isCurrent? > > > > I'm not sure if there is anything we can do in Lucene, besides sleeping > in > > FindSegmentsFile for several seconds which is not reasonable. > > Maybe a way out would be, I think, having FindSegmentsFile try to read > ahead > > and then backwards. At some point, we ought to find a segment that's > > readable, even if an old one, no? > > > > Any help will be appreciated. > > > > Shai > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >