Using a known-broken Lucene index directory, I dropped down to the Lucene API and tracked this down a bit further.
My directory listing is this: ---------------- 17 Mar 13:39 _8w.fdt 17 Mar 13:39 _8w.fdx 17 Mar 13:39 _8w.fnm 17 Mar 13:39 _8w.nvd 17 Mar 13:39 _8w.nvm 17 Mar 13:39 _8w.si 17 Mar 13:39 _8w_Lucene50_0.doc 17 Mar 13:39 _8w_Lucene50_0.pos 17 Mar 13:39 _8w_Lucene50_0.tim 17 Mar 13:39 _8w_Lucene50_0.tip 17 Mar 13:39 _8w_Lucene70_0.dvd 17 Mar 13:39 _8w_Lucene70_0.dvm 17 Mar 14:33 _8x.cfe 17 Mar 14:33 _8x.cfs 20 Mar 21:19 _8x.fdt 20 Mar 21:19 _8x.fdx 20 Mar 21:19 _8x.fnm 20 Mar 21:19 _8x.nvd 20 Mar 21:19 _8x.nvm 20 Mar 21:19 _8x.si 20 Mar 21:19 _8x_Lucene50_0.doc 20 Mar 21:19 _8x_Lucene50_0.pos 20 Mar 21:19 _8x_Lucene50_0.tim 20 Mar 21:19 _8x_Lucene50_0.tip 20 Mar 21:19 _8x_Lucene70_0.dvd 20 Mar 21:19 _8x_Lucene70_0.dvm 20 Mar 21:19 _8y.cfe 20 Mar 21:19 _8y.cfs 20 Mar 21:19 _8y.si 20 Mar 21:19 _8z.cfe 20 Mar 21:19 _8z.cfs 20 Mar 21:19 _8z.si 20 Mar 21:19 _90.cfe 20 Mar 21:19 _90.cfs 20 Mar 21:19 _90.si 20 Mar 21:19 _91.cfe 20 Mar 21:19 _91.cfs 20 Mar 21:19 _91.si 20 Mar 21:19 _92.cfe 20 Mar 21:19 _92.cfs 20 Mar 21:19 _92.si 20 Mar 21:19 _93.cfe 20 Mar 21:19 _93.cfs 20 Mar 21:19 _93.si 20 Mar 21:19 _94.cfe 20 Mar 21:19 _94.cfs 20 Mar 21:19 _94.si 20 Mar 21:19 _95.cfe 20 Mar 21:19 _95.cfs 20 Mar 21:19 _95.si 18 Mar 06:49 segments_93 20 Mar 21:19 segments_96 6 Mar 21:22 write.lock ---------------- When I load SegmentInfos for segments_96 directly, it succeeds, and I can see it's referencing all the SegmentInfo except for _8w. If I try to load SegmentInfos for segments_93, it gets past loading _8w and fails on _8x. Checking with a hex editor, segments_93 is referencing _8w ... _94 and segments_96 is referencing _8x ... _95 The IndexWriter failure is due to the IndexFileDeleter attempting to load segments_93 to track referenced commit infos. Is this a state an IndexWriter could get the directory into, or does it involve higher level interference (like copying files around)? Tim On Thu, 14 Apr 2022 at 13:20, Baris Kazar <baris.ka...@oracle.com> wrote: > yes that is a great point to look at first and that would eliminate any > jdbc related issues that may lead to such problems. > Best regards > ________________________________ > From: Tim Whittington <t...@apache.org> > Sent: Wednesday, April 13, 2022 9:17:44 PM > To: java-user@lucene.apache.org <java-user@lucene.apache.org> > Subject: Re: How to handle corrupt Lucene index > > Thanks for this - I'll have a look at the database server code that is > managing the Lucene indexes and see if I can track it down. > > Tim > > On Thu, 14 Apr 2022 at 12:41, Robert Muir <rcm...@gmail.com> wrote: > > > On Wed, Apr 13, 2022 at 8:24 PM Tim Whittington > > <t...@whittington.nz.invalid> wrote: > > > > > > I'm working with/on a database system that uses Lucene for full text > > > indexes (currently using 7.3.0). > > > We're encountering occasional problems that occur after unclean > shutdowns > > > of the database , resulting in > > > "org.apache.lucene.index.CorruptIndexException: file mismatch" errors > > when > > > the IndexWriter is constructed. > > > > > > In all of the cases this has occurred, CheckIndex finds no issues with > > the > > > Lucene index. > > > > > > The database has write-ahead-log and recovery facilities, so making the > > > Lucene indexes durable wrt database operations is doable, but in this > > case > > > the IndexWriter itself is failing to initialise, so it looks like there > > > needs to be a lower-level validation/recovery operation before > > reconciling > > > transactions can take place. > > > > > > Can anyone provide any advice about how the database can detect and > > recover > > > from this situation? > > > > > > > File mismatch means files are getting mixed up. It is the equivalent > > of swapping say, /etc/hosts and /etc/passwd on your computer. > > > > In your case you have a .si file (lets say it is named _79.si) that > > really belongs to another segment (e.g. _42). > > > > This isn't a lucene issue, this is something else you must be using > > that is "transporting files around", and it is mixing the files up. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > >