Re: restoring a corrupt index?

Michael McCandless Sat, 10 Nov 2007 14:41:03 -0800

"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 10, 2007 5:01 PM, Michael McCandless <[EMAIL PROTECTED]>
> wrote:
> > "Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> >
> > > > How can this lead to index corruption?  The "no such file or directory" 
> > > > on
> > > > loading _cf9.fnm sounds like index corruption?
> > >
> > > I don't think older versions of lucene handled these errors as well.
> > > Perhaps _cf9.fnm failed to be written, but the segments file succeeded.
> > > It could also be Solr's fault for allowing further operations on an
> > > index after one failed?  I'm not sure how that should be handled.
> >
> > Ahh, OK.
> >
> > Yeah it's not clear how Solr should handle this case, though Lucene
> > should be at (get to?) the point where on exception no harm to the
> > index can be done.  Ie, index on disk is left consistent, and, the
> > state of the writer is such that it can't corrupt the index even if
> > it's further used after the exception.
> 
> Right... keep in mind concurrent updates too... it's pretty much
> impossible for the user-level to avoid other calls to addDocument()
> from going ahead since they can be in different threads.


Ahh, true.

> > OK, though, writer doesn't use many descriptors at all, right?
> 
> Yeah, but other things might be usng up the descriptors... there is
> probably an open searcher for searching, possibly another one warming
> in the background,   plus whatever else is going on in the servlet
> container.

OK.

> Ryan, are you using stock solr, or embedded?  Anything that might
> prevent old searchers from being closed (you chould check the log
> files for #opens vs #closes)
> 
> >  It
> > opens 1 segment's worth to flush, and mergeFactor+1 segment's worth to
> > merge.  It's spooky if Lucene can create zillions of un-referenced
> > files.
> 
> Perhaps unreferenced files may not be deleted if an exception is
> encountered first, or perhaps even the deleter is failing due to lack
> of descriptors.

OK.  Still spooky that this failure mode is possible on 2.2.

> > Would be good to get to the root cause & make sure it's really
> > fixed on trunk.
> 
> Is there a tool we could have ryan point at the segments file and get
> a dump of the referenced segments?

LUCENE-1020 will print the details of all segments ... hmmm, I see
that I'm not closing each SegmentReader in that tool.  OK I just committed
a fix.  Otherwise I think Ryan would hit file descriptor limit (if these
files are referenced).

Ryan are you able to update to that commit I just did?  If so I think
you should run the tool without -fix and post back what it printed.  It
should report an error on that one segment due to the missing file.
Then, run -fix to remove that segment (please backup your index first!).
Then, if you have a zillion segments in the index, try optimizing it?

Oh, hang on: you can't run -fix because this tool will write a
segments_N file in the current trunk format, which is different
from 2.2.x.  It would not be hard to port the tool back to 2.2:
just comment out those System.out.println's that hit compilation
errors.

> > Or, maybe they are all referenced -- I think there were issues with
> > the old merge policy that could cause segments to not be merged when
> > they should have been?
> 
> That should not have been the case in Lucene 2.2 I think.

Ahh, right.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: restoring a corrupt index?

Reply via email to