I've been looking into whether or not it is possible the check a Lucene index for corruption. It doesn't matter how the corruption occurs, from JVM crashes, bad file copying or whatever. I found an old thread in this mailing list on the subject, which was from before Lucene 1.2, over 3 years ago. In this, it was suggested that a corruption-checking tool might be written. Does anyone know if anything came of this?
Thanks Shane ---------------------------------------------------------------------------------------------------- -----Original Message----- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Monday, April 02, 2002 11:51:42 GMT To: lucene-dev@jakarta.apache.org Cc: [EMAIL PROTECTED] Subject: RE: corrupted index Doug, Yep, I think waiting until after 1.2 would be a good idea. As I find time over the next couple of weeks, I'll try to start putting together a proposal. A good short-term improvement would be to document the usage of IOException in the Javadocs and explain when it might occur. In terms of subclassing IOException -- sounds like it could be a good approach. Regards, Matt > -----Original Message----- > From: Doug Cutting [mailto:[EMAIL PROTECTED] > Sent: Tuesday, April 02, 2002 11:24 AM > To: 'Lucene Developers List' > Subject: RE: corrupted index > > > Matt, > > I'd welcome a concrete proposal in this area. Probably we > should wait until we have a final 1.2 release out there > before making such changes. Note that this could be done > compatibly if the new exceptions subclass java.io.IOException. > > Doug > > > -----Original Message----- > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > Sent: Monday, April 01, 2002 9:06 PM > > To: lucene-dev@jakarta.apache.org > > Cc: [EMAIL PROTECTED] > > Subject: RE: corrupted index > > > > > > I changed the recipient from -user to -dev list, as that seems more > > appropriate. I think this would not be a bad idea, if we do > it right. > > Things like IndexLockedException, etc. sound alright to me. > > I think Doug once welcomed such a change on one of the lists, too. > > > > Perhaps a list of suggested exceptions, new exception classes and > > appropriate patches would be the best contribution. > > > > Thanks, > > Otis > > > > --- Matt Tucker <[EMAIL PROTECTED]> wrote: > > > Hey all, > > > > > > Actually, using shutdown hooks might not be the best idea since > > > Lucene is very often used in server-side Java environments. Many > > > app-servers throw security > > > errors when trying to add shutdown hooks, and I've seen Weblogic > > > crash before > > > when having them in a webapp. Has anyone else run into this? > > > > > > This all brings up a key issue with Lucene, which is that > there is > > > little way to recover from errors gracefully. I'd love to see a > > > number of checked > > > exceptions added. For example: > > > > > > IndexNotFoundException -- when trying to open an index > that doesn't > > > exist IndexLockedException -- when a lock file prevents you from > > > getting an index > > > IndexCorruptException -- maybe this would be thrown when an index > > > appears to > > > be broken? > > > > > > At the moment, Lucene throws many undocumented IOExceptions > > and even > > > NullPointerExceptions when an error case comes up. I > catch these in > > > my app, but there's really not an intelligent way to recover from > > > them. Adding checked > > > exceptions would be a change of the API, but it seems > worth it. I'd > > > be happy to > > > make a more specific proposal if other people feel like > > this would be > > > a > > > worthwhile direction to go in. > > > > > > Regards, > > > Matt > > > > > > Quoting "Spencer, Dave" <[EMAIL PROTECTED]>: > > > > > > > Runtime.addShutdownHook: > > > > > > > > > > > > > > > > > > > > > http://java.sun.com/j2se/1.3/docs/api/java/lang/Runtime.html#a > > ddShutdown > > > > Hook(java.lang.Thread) > > > > > > > > -----Original Message----- > > > > From: Otis Gospodnetic [ mailto:[EMAIL PROTECTED] > > > > Sent: Sunday, March 17, 2002 12:06 AM > > > > To: Lucene Users List > > > > Subject: Re: corrupted index > > > > > > > > > > > > Oh, I just thought of something (wine does body good). > Perhaps one > > > > could use Runtime (the class) to catch the > > JVM shutdown > > > and > > > > do whatever is needed to prevent index corruption. I > > believe there > > > are > > > > some shutdown hook methods in there that may let you do > that. I'm > > > too > > > > lazy to look up the API docs now, but I rememeber reading about > > > that > > > > once, and perhaps it was even mentioned on one of the 2 Lucene > > > mailing > > > > lists. > > > > > > > > On the other hand, it would be great to have a tool that > > can verify > > > an > > > > existing index. I don't know enough about the actual file > > > structure > > > > yet to write something like that, but maybe somebody > else has done > > > that > > > > already or would like to contribute. > > > > > > > > Otis > > > > > > > > > > > > --- "Steven J. Owens" <[EMAIL PROTECTED]> wrote: > > > > > Otis, > > > > > > > > > > > You can remove the .lock file and try re-indexing or > > continuing > > > > > > indexing where you left off. > > > > > > I am not sure about the corrupt index. I have never seen it > > > > > happen, > > > > > > and I believe I recall reading some messages from > Doug Cutting > > > > > saying > > > > > > that index should never be left in an inconsistent state. > > > > > > > > > > Obviously never "should" be, but if something's > pulling the > > > rug > > > > > out from under his JRE, changes could be only > partially written, > > > > > right? > > > > > > > > > > Or is the writing format in some sense > > transactionally safe? > > > > > I've never worked directly on something like this, > but I worked > > > at a > > > > > database software company where they used transaction > semantics > > > and a > > > > > journaling scheme to fake a "bulletproof" file > system. Is this > > > how > > > > > the index-writing code is implemented? > > > > > > > > > > In general, I can guess Doug's response - just > > torch the old > > > > > index directory and rebuild it; Lucene's indexing is > fast enough > > > that > > > > > you don't need to get clever. This seems to be > Doug's stance in > > > > > general (i.e. "don't get fancy, I already put all the > fanciness > > > > > you'll need into extremely fast indexing and searching"). So > > > > > far, it > > > seems > > > > > to work :-). > > > > > > > > > > > I could be making this up, though, so I suggest you search > > > through > > > > > > lucene-user and lucene-dev archives on > www.mail-archive.com <http://www.mail-archive.com>. A > > > > > > search for "corrupt" should do it. Once you figure > things out > > > > > > maybe you can post a summary here. > > > > > > > > > > I got a little curious, so I went and did the searches. > > > There > > > > > is > > > > > exactly one message in each list archive (dev and > > users) with the > > > > > keyword "corrupt" in it. The lucene-users instance is > > > irrelevant: > > > > > > > > > > > > > > > > > > > http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg > 00557.html > > > > > > > > The lucene-dev instance is more useful: > > > > > > > > > > > > > > http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg0 0157.html > > > > > > It's a post from Doug, dated sept 27, 2001, about adding not > > > just thread-safety but process-safety: > > > > > > It should be impossible to corrupt an index through the Lucene > API. > > > However if a Lucene process exits unexpectedly it can leave the > > > index > > > locked. The remedy is simply to, at a time when it is certain > that > > > no > > > processes are accessing the index, remove all lock files. > > > > > > So it sounds like it's worth trying just removing the lock > > > files. Hm, is there a way to come up with a "sanity check" you can > > > run > on an > > > index to make sure it's not corrupted? This might be an > excellent > > > thing to reassure yourself with: something went wrong? Run a > sanity > > > check, if it fails just reindex. > > > > > > Steven J. Owens > > > [EMAIL PROTECTED]