Thanks, Stephen. I have asked my questions at solr-u...@lucene.apache.org On Mon, Nov 11, 2019 at 11:27 AM Stephen Bianamara <sbianam...@panopto.com> wrote:
> Siddharth -- Part of the confusion here is that this is not the right email > list to ask. General is about releases, publicity, and things of that > nature. Technical threads like this are more suited for > solr-u...@lucene.apache.org. Please subscribe there and redirect your > question there instead. > > Best, > Stephen > > On Mon, Nov 11, 2019 at 11:18 AM siddharth teotia < > siddharthteo...@gmail.com> > wrote: > > > Hi Michael > > > > Can you or someone from the community please help answer my questions? > > > > Thanks > > Siddharth > > > > On Thu, Nov 7, 2019 at 7:50 AM siddharth teotia < > siddharthteo...@gmail.com > > > > > wrote: > > > > > Hi Michael > > > > > > Thanks a lot for your response. Couple of more questions > > > > > > (1) During indexing, is there any knob to tell the writer to use > off-heap > > > for buffering. I didn't find anything in the docs so probably the > answer > > is > > > no. Just confirming.. > > > > > > (2) In my experiments, I have gone upto ingesting 5 million documents > > into > > > the lucene index and the number of segments created was 1. The writer > was > > > committed and closed after ingesting all the documents and after that > > there > > > is no need for us to index more. So essentially it is an immutable > index. > > > Basically I wanted to find the threshold for creating a new segment. Is > > > that pretty high? Or if the writer is reopened, then the next set of > > > documents will go into the next segment and so on? The reason for doing > > > this is to find the total number of files (per index) that will be > opened > > > during querying. So far since it was a single segment, only that > > segment's > > > cfs file was opened. > > > > > > Thanks > > > Siddharth > > > > > > On Thu, Nov 7, 2019, 6:39 AM Michael McCandless < > > luc...@mikemccandless.com> > > > wrote: > > > > > >> Hi Siddharth, > > >> > > >> Your understanding of MMapDirectory is correct -- only give your JVM > > >> enough heap to not spend too much CPU on GC, and then let the OS use > all > > >> available remaining RAM to cache hot pages from your index. > > >> > > >> There are some structures Lucene loads into JVM heap, but even those > are > > >> being moved off-heap (accessed via Directory) recently such as FSTs > used > > >> for the terms index, and BKD index (for dimensional points). I'm not > > sure > > >> exactly which structures are still in heap ... maybe the live > documents > > >> bitset? > > >> > > >> During indexing, the recently indexed documents are buffered in JVM > > heap, > > >> up until the IndexWriterConfig.setRAMBufferSizeMB and then they will > be > > >> written to the Directory as new segments. > > >> > > >> Mike McCandless > > >> > > >> http://blog.mikemccandless.com > > >> > > >> > > >> On Wed, Nov 6, 2019 at 11:27 PM siddharth teotia < > > >> siddharthteo...@gmail.com> wrote: > > >> > > >>> Hi All > > >>> > > >>> I have some questions about the memory usage. I would really > appreciate > > >>> if > > >>> someone can help answer these. > > >>> > > >>> I understand from the docs that during reading/querying, Lucene uses > > >>> MMapDirectory (assuming it is supported on the platform). So the Java > > >>> heap > > >>> overhead in this case will purely come from the objects that are > > >>> allocated/instantiated on the query path to process the query and > build > > >>> results etc. But the whole index itself will not be loaded into > memory > > >>> because we memory mapped the file. Is my understanding correct? In > this > > >>> case, we are better off not increasing the Java heap and keep as much > > >>> as possible available for the file system cache for mmap to do its > job > > >>> efficiently. > > >>> > > >>> However, are there any portions of index structures that are > completely > > >>> loaded in memory regardless of whether it is MMapDirectory or not? If > > so, > > >>> are they loaded in Java heap or do we use off-heap (direct buffers) > in > > >>> such cases? > > >>> > > >>> Secondly, on the write path I think even though the writer opens a > > >>> MMapDirectory, the writes are gathered/buffered in memory upto a > flush > > >>> threshold controlled by IndexWriterConfig. Is this buffering done in > > Java > > >>> heap or direct memory? > > >>> > > >>> Thanks a lot for help > > >>> Siddharth > > >>> > > >> > > > > -- > > *Best Regards,* > > *SIDDHARTH TEOTIA* > > *2008C6PS540G* > > *BITS PILANI- GOA CAMPUS* > > > > *+91 87911 75932* > > > > > -- > Thanks! > > Stephen Bianamara > Search Technology - Technical Lead > -- *Best Regards,* *SIDDHARTH TEOTIA* *2008C6PS540G* *BITS PILANI- GOA CAMPUS* *+91 87911 75932*