I am not happy with complicating the readers like this, conceptually adding back commit locks (for deletion), this time with a keep-a-life thread, and again making readers not read-only.
To my understanding the only remaining issue with NFS is: a reader might get an IO exception in case writer removed an old file that the reader is using. It is not a possible corruption that we try to solve, right? For that I think it is not worth to add that stuff again. A writer's "two steps" policy - delete only files that "would have not been in use unless a reader did not refresh for X minutes" is "fair enough" I think. By "two steps" I mean, start measuring time not from when segment to be deleted was created, but rather from when its "next generation" was created. Michael McCandless <[EMAIL PROTECTED]> wrote on 18/01/2007 14:24:16: > Marvin Humphrey wrote: > > > > On Jan 17, 2007, at 1:16 PM, Michael McCandless wrote: > > > >> This is the solution I have in mind for LUCENE-710: change the > >> IndexFileDeleter so that instead of always immediately deleting the > >> last commit when a new commit happens, allow some time before doing > >> so. This way readers have a chance to refresh. The actual time would > >> be settable by the developer. So if you set it to 6 hours, then, a > >> commit would remain usable for at least 6 hours after it had been > >> obsoleted by a new commit. This means if you can ensure your readers > >> refresh within 6 hours of a new commit happening, then the writer will > >> never delete an "in-use" commit. > > > > I've been mulling this over. If you set the interval to 6 hours, and > > there's a lot of churn (e.g. if you optimize frequently), you'll end up > > with a lot of wasted disk space. On the flip side, the user has to set > > up some sort of trigger for refreshing the IndexReaders anyway. It's > > still not user-friendly by default, and we'd be polluting the API with a > > hateful workaround. > > Well, 6 hours would be a long time for such a high turnover site. > They would presumably set the time to something like 10 minutes > instead. > > I think we should decouple the deletion policy from commits. This way > developers could subclass and make their own deletion policy that > suits their application. The IndexFileDeleter base class would do all > the legwork to keep ref counts to all specific index files based on > all segments_N commits that are still "live". Then the deletion > policy just decides which commits should be deleted, when. (This is > roughly what's outlined in LUCENE-710). > > The current policy is to delete all prior commits after a new commit > and that would remain the default. > > Chuck's idea (reference counting via filesystem) would be another > policy. My proposal (delete by time after being obsoleted) would be > another policy, etc. > > > The real problem is NFS. For background, see > > <http://nfs.sourceforge.net/#section_d>, item D2, which deals with NFS > > and "delete on last close". > > > > Now I wonder. Version 4 of the NFS protocol introduces state, so it's > > possible to implement file locking. Can we lock a segments file, then > > have IndexFileDeleter detect which segments are locked that way? And if > > that's the case, can we detect whether the locking mechanism is failing > > and throw an exception if someone tries to use an earlier version of NFS? > > Locking and NFS makes me very nervous :) > > > I'd be cool with making it impossible to put an index on an NFS volume > > prior to version 4. That puts the blame where it belongs. > > Well, most times users have no control over which NFS server and/or > client version is in use, so I think taking this approach of "pinning > the blame" can only hurt our users. I would rather find a solution > that's more portable, if we can (like the ref counting idea Chuck > brought up). > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]