Hi, "segments.gen" no longer exists in Lucene 5.x (because of Java 7 NIO.2 update). Every commit point (segments_xxx) also gets a new filename.
This means: Yes, every (and really every) file in a Lucene index is write-once. That is the basis of the whole snapshotting concept that Lucene internally uses. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Larry White [mailto:lwh...@tracelink.com] > Sent: Saturday, September 12, 2015 7:59 PM > To: java-user@lucene.apache.org > Subject: Re: mutability of lucene index files > > Hi Erick, > > Thank you. > > Deleting old files is fine (and expected), so it sounds like the segment files > are immutable (prior to deletion) and the file that handles deletion is > renamed with every change, so it's effectively immutable, too. > > That leaves the segments_* files and segments.gen, if I understand > correctly. > > And thank you for the pointer. I'm hoping to use the same process to backup > and restore all my data (Lucene and otherwise), and to be able to use an > incremental approach so that the system doesn't need to be offline too long, > but I'll definitely take another look at snapshots. > > Thanks again > > > On Sat, Sep 12, 2015 at 12:50 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > The Lucene index segment files are immutable, once they're closed, > > they are never changed. These are things like _1.fdt, _1.tim, etc. All > > of the files with the same prefix (_1 in my example) comprise a single > > "segment". Segments _will_, however, disappear. During indexing, two > > or more segment are combined into a new segment, so _1.*, _2.* and > > _3.* could be copied to _4.* then _1.*, _2.* and _3.* will be removed. > > > > There is one exception to the rule "segment files are not changed", > > and that's the file that contains information about documents in that > > segment that have been deleted. Actually that file is re-written to a > > new name every time a doc is deleted from the segment upon commit. > > > > And another exception is that there is a file or two that contains the > > information about what segments comprise the most recent (hard) > > commit, in 4x segments_* and segments.gen. > > > > So rather than try to wrap your head around all this and then worry > > about what changes when the next major release comes out, would it > > work to just use the built-in snapshot process? Here's something I > > found (but didn't look at very closely) to get you started: > > > > http://stackoverflow.com/questions/17753226/lucene-4-3-1-backup- > proces > > s > > > > And there's a link to the Lucene user's list where the question was > > answered.. > > > > Best, > > Erick > > > > On Sat, Sep 12, 2015 at 7:59 AM, Larry White <lwh...@tracelink.com> > wrote: > > > Hi, > > > > > > I'm writing a backup routine for a system that includes Lucene for > > > full-text search. The primary data store is based on immutable > > > files, so > > it > > > can be backed-up incrementally by copying any new files (and > > > removing any files that have been deleted from earlier backups). > > > It's my understanding from brief comments found on the internet that > > > most, if not all the files that comprise a Lucene index are similarly > immutable. > > > > > > Can someone please confirm or deny that statement? > > > > > > If the Lucene files are mostly, but not entirely, immutable, it > > > would be greatly appreciated if the exceptions could be identified. > > > I would > > imagine > > > there might be log files that would be mutable, for example. > > > > > > Thank you very much for your help. > > > > > > Larry > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > -- > *Larry White | TraceLink Inc. | Principal Software Architect* > 400 Riverpark Dr. | North Reading, MA | 01864 > e: lwh...@tracelink.com > www.tracelink.com > > > *Protect patients, enable health, grow profits, ensure compliance* --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org