[ 
https://issues.apache.org/jira/browse/LUCENENET-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265916#comment-13265916
 ] 

Simon Svensson commented on LUCENENET-488:
------------------------------------------

The following may be off since I don't know the inner technical workings of 
Lucene.Net.

All terms in your index is read into an in-memory index when opening an 
IndexReader. The termInfosIndexDivisor tells the IndexReader instance to read 
every n-th term into this index. The default value, 1, will cause every term to 
be loaded into memory. Using termIndexIndexDivisor=2 means that you'll read 
every second term into memory, theoretically halving the required memory size. 
Your value, 10, would only consume a tenth of the memory compared to 
termIndexDivisor=1.

This comes to a price; as 9 out of 10 terms are not cached in memory they take 
longer time to retrieve. This is done in many cases, like a new TermQuery("f", 
"test"). It needs to seek to the indexed term, then iterate forward until it 
matches the correct term. This could be, if "teargas" was the indexed term; 
teargas > technicians > tegument > teleconference > temporal > tenotomy > 
teocalli > terbium > test. Instead of being able to directly seek to the term, 
we now seek to a term before, and iterate the list for another 8 terms. (It 
would still go faster than the time it took for me to find odd example words...)

I've never measured this, but I doubt that low numbers will cause much trouble. 
Any term except "teargas" would need to read the term information from disk, 
and this disk read will [probably] end up in the file system cache. I can see a 
problem if you have numbers high enough causing a second disk read, but at what 
value of termInfosIndexDivisor this happens is system-dependent. The size of 
the disk reads, the amount of data per term, etc, would affect this. I guess 
you could use a low-level monitoring tool (Process Monitor?) to see every read 
if you really want to find the "perfect" number.

I believe this bug report can be closed as invalid; it was a case of default 
values that did not work out for 200 GiB indexes. Do you agree on this, Steven?
                
> Can't open IndexReader, get OutOFMemory Exception
> -------------------------------------------------
>
>                 Key: LUCENENET-488
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-488
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.4g
>         Environment: Windows server 2008R2
>            Reporter: Steven
>
> Have build a large database with ~1Bn records (2 items per document) it has 
> size 200GB on disk. I managed to write the indexe by chunking into 100,000 
> blocks as I ended up with some threading issues (another bug submission). 
> Anyway the index is built but I can't open it and get a memory exception 
> (process explorer gets to 1.5GB allocated before it dies but not sure how 
> reliable that is, but do know there is plenty more RAM left on the box).
> Stack trace below:
> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' 
> was
>  thrown.
>    at Lucene.Net.Index.TermInfosReader..ctor(Directory dir, String seg, 
> FieldInf
> os fis, Int32 readBufferSize, Int32 indexDivisor)
>    at Lucene.Net.Index.SegmentReader.CoreReaders..ctor(SegmentReader 
> origInstanc
> e, Directory dir, SegmentInfo si, Int32 readBufferSize, Int32 
> termsIndexDivisor)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, Directory dir, 
> Segmen
> tInfo si, Int32 readBufferSize, Boolean doOpenStores, Int32 
> termInfosIndexDiviso
> r)
>    at Lucene.Net.Index.SegmentReader.Get(Boolean readOnly, SegmentInfo si, 
> Int32
>  termInfosIndexDivisor)
>    at Lucene.Net.Index.DirectoryReader..ctor(Directory directory, 
> SegmentInfos s
> is, IndexDeletionPolicy deletionPolicy, Boolean readOnly, Int32 
> termInfosIndexDi
> visor)
>    at Lucene.Net.Index.DirectoryReader.<>c__DisplayClass1.<Open>b__0(String 
> segm
> entFileName)
>    at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run(IndexCommit commit)
>    at Lucene.Net.Index.DirectoryReader.Open(Directory directory, 
> IndexDeletionPo
> licy deletionPolicy, IndexCommit commit, Boolean readOnly, Int32 
> termInfosIndexD
> ivisor)
>    at Lucene.Net.Index.IndexReader.Open(String path, Boolean readOnly)
>    at Lucene.Net.Demo.SearchFiles.Main(String[] args)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to