+1 to MyCoy's suggestion.

To answer your most immediate questions:
 - Lucene mostly loads metadata in memory at the time of opening a segment
(dvm, tmd, fdm, vem, nvm, kdm files), other files are memory-mapped and
Lucene relies on the filesystem cache to have their data efficiently
available. This allows Lucene to have a very small memory footprint for
searching.
 - Finite state machines are mostly used for suggesters and for the terms
index (tip file), which essentially stores all prefixes that are shared by
25-40 terms in a FST.

On Sun, Nov 6, 2022 at 2:12 AM MyCoy Z <mycoy.zh...@gmail.com> wrote:

> I just started learning Lucene HNSW source code last months.
>
> I find the most effective way is to start with the testcases, set debugging
> break points in the code you're interested in, and walk through the code
>
> Regards
> MyCoy
>
> On Fri, Nov 4, 2022 at 9:24 PM Rahul Goswami <rahul196...@gmail.com>
> wrote:
>
> > Hello,
> > I have been working with Lucene and Solr for quite some time and have a
> > good understanding of a lot of moving parts at the code level. However I
> > wish to learn Lucene  internals from the ground up and want to
> familiarize
> > myself with all the dirty details. I would like to know what would be the
> > best way to go about it.
> >
> > To kick things off, I have been thinking about picking up “Lucene in
> > Action”, but have been hesitant (and possibly wrongly) since it is based
> on
> > Lucene 3.0 and we have come a long way since then. To give an example of
> > the level of detail I wish to learn (among other things) would be what
> > parts of a segment (.tim, .tip, etc) get loaded in memory at search time,
> > which part uses finite state machines and why, etc
> >
> > I would really appreciate any thoughts/inputs on how I can go about this.
> > Thanks in advance!
> >
> > Regards,
> > Rahul
> >
>


-- 
Adrien

Reply via email to