[
https://issues.apache.org/jira/browse/LUCENENET-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265916#comment-13265916
]
Simon Svensson commented on LUCENENET-488:
------------------------------------------
The following may be off since I don't know the inner technical workings of
Lucene.Net.
All terms in your index is read into an in-memory index when opening an
IndexReader. The termInfosIndexDivisor tells the IndexReader instance to read
every n-th term into this index. The default value, 1, will cause every term to
be loaded into memory. Using termIndexIndexDivisor=2 means that you'll read
every second term into memory, theoretically halving the required memory size.
Your value, 10, would only consume a tenth of the memory compared to
termIndexDivisor=1.
This comes to a price; as 9 out of 10 terms are not cached in memory they take
longer time to retrieve. This is done in many cases, like a new TermQuery("f",
"test"). It needs to seek to the indexed term, then iterate forward until it
matches the correct term. This could be, if "teargas" was the indexed term;
teargas > technicians > tegument > teleconference > temporal > tenotomy >
teocalli > terbium > test. Instead of being able to directly seek to the term,
we now seek to a term before, and iterate the list for another 8 terms. (It
would still go faster than the time it took for me to find odd example words...)
I've never measured this, but I doubt that low numbers will cause much trouble.
Any term except "teargas" would need to read the term information from disk,
and this disk read will [probably] end up in the file system cache. I can see a
problem if you have numbers high enough causing a second disk read, but at what
value of termInfosIndexDivisor this happens is system-dependent. The size of
the disk reads, the amount of data per term, etc, would affect this. I guess
you could use a low-level monitoring tool (Process Monitor?) to see every read
if you really want to find the "perfect" number.
I believe this bug report can be closed as invalid; it was a case of default
values that did not work out for 200 GiB indexes. Do you agree on this, Steven?
> Can't open IndexReader, get OutOFMemory Exception
> -------------------------------------------------
>
> Key: LUCENENET-488
> URL: https://issues.apache.org/jira/browse/LUCENENET-488
> Project: Lucene.Net
> Issue Type: Bug
> Components: Lucene.Net Core
> Affects Versions: Lucene.Net 2.9.4g
> Environment: Windows server 2008R2
> Reporter: Steven
>
> Have build a large database with ~1Bn records (2 items per document) it has
> size 200GB on disk. I managed to write the indexe by chunking into 100,000
> blocks as I ended up with some threading issues (another bug submission).
> Anyway the index is built but I can't open it and get a memory exception
> (process explorer gets to 1.5GB allocated before it dies but not sure how
> reliable that is, but do know there is plenty more RAM left on the box).
> Stack trace below:
> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException'
> was
> thrown.
> at Lucene.Net.Index.TermInfosReader..ctor(Directory dir, String seg,
> FieldInf
> os fis, Int32 readBufferSize, Int32 indexDivisor)
> at Lucene.Net.Index.SegmentReader.CoreReaders..ctor(SegmentReader
> origInstanc
> e, Directory dir, SegmentInfo si, Int32 readBufferSize, Int32
> termsIndexDivisor)
> at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, Directory dir,
> Segmen
> tInfo si, Int32 readBufferSize, Boolean doOpenStores, Int32
> termInfosIndexDiviso
> r)
> at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, SegmentInfo si,
> Int32
> termInfosIndexDivisor)
> at Lucene.Net.Index.DirectoryReader..ctor(Directory directory,
> SegmentInfos s
> is, IndexDeletionPolicy deletionPolicy, Boolean readOnly, Int32
> termInfosIndexDi
> visor)
> at Lucene.Net.Index.DirectoryReader.<>c__DisplayClass1.<Open>b__0(String
> segm
> entFileName)
> at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
> at Lucene.Net.Index.DirectoryReader.Open(Directory directory,
> IndexDeletionPo
> licy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32
> termInfosIndexD
> ivisor)
> at Lucene.Net.Index.IndexReader.Open(String path, Boolean readOnly)
> at Lucene.Net.Demo.SearchFiles.Main(String[] args)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira