Sounds good to me. So it is IndexFileDeleter that can be used by applications to guarantee "their" NFS-safe behavior, namely preventing premature files deletions. Cool. We can probably sometimes write one such alternative, even in contrib.
But, should enabling this way of extending IndexFileDeleter be part of coming 2.1 release, or is it just a future wish? Because I am not sure that current interfaces of/with IndexFileDeleter are sufficient for this: 1) IndexWriter does not expose setDeleter(). It should probably somehow be in the constructor, because already at that time files are deleted. (or found). 2) IndexReader allows setting the deleter, but only after the reader is open. This is okay for its role in commit() (deleting). But this might be too late for its new role (touching) - some writer may be deciding to delete files in between. There are more questions, but no point in getting to them unless this extendibility is intended for 2.1. (?) Michael McCandless <[EMAIL PROTECTED]> wrote on 18/01/2007 17:37:57: > Doron Cohen wrote: > > I am not happy with complicating the readers like this, conceptually > > adding back commit locks (for deletion), this time with a keep-a-life > > thread, and again making readers not read-only. > > > > To my understanding the only remaining issue with NFS is: a reader > > might get an IO exception in case writer removed an old file that > > the reader is using. > > > > It is not a possible corruption that we try to solve, right? > > > > For that I think it is not worth to add that stuff again. > > > > A writer's "two steps" policy - delete only files that > > "would have not been in use unless a reader did not refresh for X minutes" > > is "fair enough" I think. > > > > By "two steps" I mean, start measuring time not from when segment to be > > deleted was created, but rather from when its "next generation" was > > created. > > Right, this was my original proposed deletion policy (below) for > things to work on NFS. > > It does assume/require your application can refresh readers within the > specified time period. A commit (and any segments that then ref count > to zero) gets deleted after they have been "obsoleted" for more than X > minutes. > > Even though it's not perfect (progress not perfection!), I like it the > best of the three options discussed on this thread so far because 1) > it leaves the readers read only, and 2) it should work on all versions > of NFS. > > This would just be a different deletion policy, and it wouldn't be the > default one. We would leave the default as "keep only last commit > and delete old one immediately", for backwards compatibility. > > Finally, an application can always make their own deletion policy > (subclass IndexFileDeleter) if they need to. > > Mike > > > Michael McCandless <[EMAIL PROTECTED]> wrote on 18/01/2007 > > 14:24:16: > > > >> Marvin Humphrey wrote: > >>> On Jan 17, 2007, at 1:16 PM, Michael McCandless wrote: > >>> > >>>> This is the solution I have in mind for LUCENE-710: change the > >>>> IndexFileDeleter so that instead of always immediately deleting the > >>>> last commit when a new commit happens, allow some time before doing > >>>> so. This way readers have a chance to refresh. The actual time would > >>>> be settable by the developer. So if you set it to 6 hours, then, a > >>>> commit would remain usable for at least 6 hours after it had been > >>>> obsoleted by a new commit. This means if you can ensure your readers > >>>> refresh within 6 hours of a new commit happening, then the writer will > >>>> never delete an "in-use" commit. > >>> I've been mulling this over. If you set the interval to 6 hours, and > >>> there's a lot of churn (e.g. if you optimize frequently), you'll end up > > > >>> with a lot of wasted disk space. On the flip side, the user has to set > > > >>> up some sort of trigger for refreshing the IndexReaders anyway. It's > >>> still not user-friendly by default, and we'd be polluting the API with > > a > >>> hateful workaround. > >> Well, 6 hours would be a long time for such a high turnover site. > >> They would presumably set the time to something like 10 minutes > >> instead. > >> > >> I think we should decouple the deletion policy from commits. This way > >> developers could subclass and make their own deletion policy that > >> suits their application. The IndexFileDeleter base class would do all > >> the legwork to keep ref counts to all specific index files based on > >> all segments_N commits that are still "live". Then the deletion > >> policy just decides which commits should be deleted, when. (This is > >> roughly what's outlined in LUCENE-710). > >> > >> The current policy is to delete all prior commits after a new commit > >> and that would remain the default. > >> > >> Chuck's idea (reference counting via filesystem) would be another > >> policy. My proposal (delete by time after being obsoleted) would be > >> another policy, etc. > >> > >>> The real problem is NFS. For background, see > >>> <http://nfs.sourceforge.net/#section_d>, item D2, which deals with NFS > >>> and "delete on last close". > >>> > >>> Now I wonder. Version 4 of the NFS protocol introduces state, so it's > >>> possible to implement file locking. Can we lock a segments file, then > >>> have IndexFileDeleter detect which segments are locked that way? And > > if > >>> that's the case, can we detect whether the locking mechanism is failing > > > >>> and throw an exception if someone tries to use an earlier version of > > NFS? > >> Locking and NFS makes me very nervous :) > >> > >>> I'd be cool with making it impossible to put an index on an NFS volume > >>> prior to version 4. That puts the blame where it belongs. > >> Well, most times users have no control over which NFS server and/or > >> client version is in use, so I think taking this approach of "pinning > >> the blame" can only hurt our users. I would rather find a solution > >> that's more portable, if we can (like the ref counting idea Chuck > >> brought up). > >> > >> Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]