The latest terms dictionary is "block tree", and unfortunately there are no guides here, besides of course the source code (BlockTreeTermsWriter/Reader). See especially the comments in those sources: they point to a paper describing the inspiration for this implementation.
The high level view is that this terms dictionary breaks up the sorted terms into variable sized blocks (25 to 48 terms in each block) at "good" boundaries, where the term prefixes change, to maximize overall compression. The in-memory (JVM heap) FST terms index is used to find which on-disk block may have a given term, and so on lookup of a given term, we walk the FST, and then seek to that block and scan. Mike McCandless http://blog.mikemccandless.com On Wed, Jul 6, 2016 at 12:04 PM, Mohit Sidana <[email protected]> wrote: > Hello, > > I am interested to learn more about how Lucene uses block tree term > dictionary. > > while doing research on this topic i found some useful information listed > on below links. > > > 1. > http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html > 2. > http://blog.mikemccandless.com/2013/09/lucene-now-has-in-memory-terms.html > 3. http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal > > > I do understand that Lucene uses <FST> to store Prefixes of terms in to > memory and lookup terms/posting on disk but i am unable to visualize how > actual search working in Lucene 6.0. > > Please can someone suggest a guide which i can follow to understand all > step by step operation how actually a term search works with blockterms > dictionary? > > Thanks. >
