Excellent. Thanks!

On Thu, Jan 3, 2019 at 2:33 PM Erick Erickson <[email protected]>
wrote:

> 1> A segment is a miniature index that holds part of the total logical
> index, each segment is complete in and of itself.
> All the files with the same prefix comprise a single segment. I.e.
> _0.ftd, _0.fdx, _0.tim... all make up a segment. See:
>
> https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/codecs/lucene70/package-summary.html
> .
> Each extension holds different information about that segment.
>
> 2> No. The segments_N file contains a list of the current segments as
> of come commit point. In the absence of active indexing, segments_n
> will contain all the segments in the index directory. There's a lot of
> nuance here that I'm skipping about how segments come and go based on
> background merging and the like, how an "index searcher" only "sees"
> certain segments until a new searcher is opened and the like, but
> that's kind of extraneous at this point.
>
> 3> Yes, kind of. Don't think of it as "files" though, think of it as
> "segments". IOW, if segments 0, 1, 2, 3 are being merged into segment
> 4, then _0.fdt, _1.fdt, _2.fdt and _3.fdt will be merged into _4.fdt
> and so on for all the different extensions. Once all the merging is
> done and a new searcher is opened, _0.*, _1.*, _2.* and _3.* will be
> deleted.
>
> 4> Pretty much. Again, think of it as segments rather than files
> though. Here's Mike McCandless' excellent blog on the topic:
>
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> .
> TieredMergePolicy (TMP) is the default (third graphic down IIRC).
> Basically, your maxMergeAtOnce being set to 10 means that 10 roughly
> same-sized segments will be merged into a new segment. The idea here
> is that let's say maxMergeAtOnce is 3 ('cause it's easier to enumerate
> than 10). Let's further say you have 3 segments, of sizes (in M) 1, 1,
> 100. It'd be extremely wasteful to rewrite that 100M segment into a
> new segment just to add 2 more M, so TMP waits until there are three
> smaller segments 1, 1, 1, 100 and merges the three similar sized
> segments into one so you wind up with two segments of sizes 3 and 100.
> When there are 3 3M segments, they're merged into a 9M segment and so
> on. Incidentally, the default max segment  size is 5G so at some point
> you'll have segments that won't be merged unless they have a lot of
> deleted docs.
>
> I'm skipping a _lot_ here about how "like sized" segments are chosen.
>
> All that said, by and large you should simply ignore this unless
> you're trying to troubleshoot some kind of performance issue.......
>
> Best,
> Erick
>
> On Thu, Jan 3, 2019 at 1:58 PM John Wilson <[email protected]>
> wrote:
> >
> > Hi,
> >
> > I'm watching my index directory while indexing million documents. While
> my indexer runs, I see a number of files with extensions like tip, doc,
> tim, fdx, fdt, etc being created. The total number of these files goes up
> and down during the run -- from as high as 1500 in the middle of the run to
> 290 when the indexer completes. Finally, I see that an additional file
> segments_1 being created.
> >
> > My questions:
> >
> > What exactly is a segment?
> > In my case, does it mean that I just have 1 segment since I have just
> one segments_1 file? Or,
> > Is it the case that files of the same type (extension) get merged
> together into bigger files? For example, many fdt files being merged into
> one or bigger fdt files?
> > maxMergeAtOnce specifies the # of many segments at once to merge. In my
> case, what does this mean? If I set it to 10, for example, does it mean
> that once the # of files for a specific file type (e.g. fdt) reaches 10, it
> is combined into a single fdt file?
> >
> > Thanks in advance!
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to