Using a known-broken Lucene index directory, I dropped down to the Lucene
API and tracked this down a bit further.

My directory listing is this:

----------------
17 Mar 13:39 _8w.fdt
17 Mar 13:39 _8w.fdx
17 Mar 13:39 _8w.fnm
17 Mar 13:39 _8w.nvd
17 Mar 13:39 _8w.nvm
17 Mar 13:39 _8w.si
17 Mar 13:39 _8w_Lucene50_0.doc
17 Mar 13:39 _8w_Lucene50_0.pos
17 Mar 13:39 _8w_Lucene50_0.tim
17 Mar 13:39 _8w_Lucene50_0.tip
17 Mar 13:39 _8w_Lucene70_0.dvd
17 Mar 13:39 _8w_Lucene70_0.dvm
17 Mar 14:33 _8x.cfe
17 Mar 14:33 _8x.cfs
20 Mar 21:19 _8x.fdt
20 Mar 21:19 _8x.fdx
20 Mar 21:19 _8x.fnm
20 Mar 21:19 _8x.nvd
20 Mar 21:19 _8x.nvm
20 Mar 21:19 _8x.si
20 Mar 21:19 _8x_Lucene50_0.doc
20 Mar 21:19 _8x_Lucene50_0.pos
20 Mar 21:19 _8x_Lucene50_0.tim
20 Mar 21:19 _8x_Lucene50_0.tip
20 Mar 21:19 _8x_Lucene70_0.dvd
20 Mar 21:19 _8x_Lucene70_0.dvm
20 Mar 21:19 _8y.cfe
20 Mar 21:19 _8y.cfs
20 Mar 21:19 _8y.si
20 Mar 21:19 _8z.cfe
20 Mar 21:19 _8z.cfs
20 Mar 21:19 _8z.si
20 Mar 21:19 _90.cfe
20 Mar 21:19 _90.cfs
20 Mar 21:19 _90.si
20 Mar 21:19 _91.cfe
20 Mar 21:19 _91.cfs
20 Mar 21:19 _91.si
20 Mar 21:19 _92.cfe
20 Mar 21:19 _92.cfs
20 Mar 21:19 _92.si
20 Mar 21:19 _93.cfe
20 Mar 21:19 _93.cfs
20 Mar 21:19 _93.si
20 Mar 21:19 _94.cfe
20 Mar 21:19 _94.cfs
20 Mar 21:19 _94.si
20 Mar 21:19 _95.cfe
20 Mar 21:19 _95.cfs
20 Mar 21:19 _95.si
18 Mar 06:49 segments_93
20 Mar 21:19 segments_96
6 Mar 21:22 write.lock

----------------

When I load SegmentInfos for segments_96 directly, it succeeds, and I can
see it's referencing all the SegmentInfo except for _8w.
If I try to load SegmentInfos for segments_93, it gets past loading _8w and
fails on _8x.
Checking with a hex editor, segments_93 is referencing _8w ... _94 and
segments_96 is referencing _8x ... _95

The IndexWriter failure is due to the IndexFileDeleter attempting to load
segments_93 to track referenced commit infos.

Is this a state an IndexWriter could get the directory into, or does it
involve higher level interference (like copying files around)?

Tim

On Thu, 14 Apr 2022 at 13:20, Baris Kazar <baris.ka...@oracle.com> wrote:

> yes that is a great point to look at first and that would eliminate any
> jdbc related issues that may lead to such problems.
> Best regards
> ________________________________
> From: Tim Whittington <t...@apache.org>
> Sent: Wednesday, April 13, 2022 9:17:44 PM
> To: java-user@lucene.apache.org <java-user@lucene.apache.org>
> Subject: Re: How to handle corrupt Lucene index
>
> Thanks for this - I'll have a look at the database server code that is
> managing the Lucene indexes and see if I can track it down.
>
> Tim
>
> On Thu, 14 Apr 2022 at 12:41, Robert Muir <rcm...@gmail.com> wrote:
>
> > On Wed, Apr 13, 2022 at 8:24 PM Tim Whittington
> > <t...@whittington.nz.invalid> wrote:
> > >
> > > I'm working with/on a database system that uses Lucene for full text
> > > indexes (currently using 7.3.0).
> > > We're encountering occasional problems that occur after unclean
> shutdowns
> > > of the database , resulting in
> > > "org.apache.lucene.index.CorruptIndexException: file mismatch" errors
> > when
> > > the IndexWriter is constructed.
> > >
> > > In all of the cases this has occurred, CheckIndex finds no issues with
> > the
> > > Lucene index.
> > >
> > > The database has write-ahead-log and recovery facilities, so making the
> > > Lucene indexes durable wrt database operations is doable, but in this
> > case
> > > the IndexWriter itself is failing to initialise, so it looks like there
> > > needs to be a lower-level validation/recovery operation before
> > reconciling
> > > transactions can take place.
> > >
> > > Can anyone provide any advice about how the database can detect and
> > recover
> > > from this situation?
> > >
> >
> > File mismatch means files are getting mixed up. It is the equivalent
> > of swapping say, /etc/hosts and /etc/passwd on your computer.
> >
> > In your case you have a .si file (lets say it is named _79.si) that
> > really belongs to another segment (e.g. _42).
> >
> > This isn't a lucene issue, this is something else you must be using
> > that is "transporting files around", and it is mixing the files up.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>

Reply via email to