Excellent. Thanks! On Thu, Jan 3, 2019 at 2:33 PM Erick Erickson <[email protected]> wrote:
> 1> A segment is a miniature index that holds part of the total logical > index, each segment is complete in and of itself. > All the files with the same prefix comprise a single segment. I.e. > _0.ftd, _0.fdx, _0.tim... all make up a segment. See: > > https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/codecs/lucene70/package-summary.html > . > Each extension holds different information about that segment. > > 2> No. The segments_N file contains a list of the current segments as > of come commit point. In the absence of active indexing, segments_n > will contain all the segments in the index directory. There's a lot of > nuance here that I'm skipping about how segments come and go based on > background merging and the like, how an "index searcher" only "sees" > certain segments until a new searcher is opened and the like, but > that's kind of extraneous at this point. > > 3> Yes, kind of. Don't think of it as "files" though, think of it as > "segments". IOW, if segments 0, 1, 2, 3 are being merged into segment > 4, then _0.fdt, _1.fdt, _2.fdt and _3.fdt will be merged into _4.fdt > and so on for all the different extensions. Once all the merging is > done and a new searcher is opened, _0.*, _1.*, _2.* and _3.* will be > deleted. > > 4> Pretty much. Again, think of it as segments rather than files > though. Here's Mike McCandless' excellent blog on the topic: > > http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html > . > TieredMergePolicy (TMP) is the default (third graphic down IIRC). > Basically, your maxMergeAtOnce being set to 10 means that 10 roughly > same-sized segments will be merged into a new segment. The idea here > is that let's say maxMergeAtOnce is 3 ('cause it's easier to enumerate > than 10). Let's further say you have 3 segments, of sizes (in M) 1, 1, > 100. It'd be extremely wasteful to rewrite that 100M segment into a > new segment just to add 2 more M, so TMP waits until there are three > smaller segments 1, 1, 1, 100 and merges the three similar sized > segments into one so you wind up with two segments of sizes 3 and 100. > When there are 3 3M segments, they're merged into a 9M segment and so > on. Incidentally, the default max segment size is 5G so at some point > you'll have segments that won't be merged unless they have a lot of > deleted docs. > > I'm skipping a _lot_ here about how "like sized" segments are chosen. > > All that said, by and large you should simply ignore this unless > you're trying to troubleshoot some kind of performance issue....... > > Best, > Erick > > On Thu, Jan 3, 2019 at 1:58 PM John Wilson <[email protected]> > wrote: > > > > Hi, > > > > I'm watching my index directory while indexing million documents. While > my indexer runs, I see a number of files with extensions like tip, doc, > tim, fdx, fdt, etc being created. The total number of these files goes up > and down during the run -- from as high as 1500 in the middle of the run to > 290 when the indexer completes. Finally, I see that an additional file > segments_1 being created. > > > > My questions: > > > > What exactly is a segment? > > In my case, does it mean that I just have 1 segment since I have just > one segments_1 file? Or, > > Is it the case that files of the same type (extension) get merged > together into bigger files? For example, many fdt files being merged into > one or bigger fdt files? > > maxMergeAtOnce specifies the # of many segments at once to merge. In my > case, what does this mean? If I set it to 10, for example, does it mean > that once the # of files for a specific file type (e.g. fdt) reaches 10, it > is combined into a single fdt file? > > > > Thanks in advance! > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
