[jira] Updated: (LUCENE-1520) OOM erros with CheckIndex with indexes containg a lot of fields with norms

Uwe Schindler (JIRA) Fri, 16 Jan 2009 02:50:27 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated LUCENE-1520:
----------------------------------

    Attachment: LUCENE-1520.patch

Again a slightly improved patch. byte[] is only allocated one time for all 
fields in CheckIndex. The length check is unnecessary, because the array is 
preallocated to maxDoc. Moved this a little bit modified to compare SegmentInfo 
docCount and Reader maxDoc

> OOM erros with CheckIndex with indexes containg a lot of fields with norms
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-1520
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1520
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1520.patch, LUCENE-1520.patch
>
>
> All index readers have a cache of the last used norms (SegmentReader, 
> MultiReader, MultiSegmentReader,...). This cache is never cleaned up, so if 
> you access norms of a field, the norm's byte[maxdoc()] array is not freed 
> until you close/reopen the index.
> You can see this problem, if you create an index with many fields with norms 
> (I tested with about 4,000 fields) and many documents (half a million). If 
> you then call CheckIndex, that calls norms() for each (!) field in the 
> Segment and each of this calls creates a new cache entry, you get 
> OutOfMemoryExceptions after short time (I tested with the above index: I was 
> not able to do a CheckIndex even with "-Xmx 16GB" on 64bit Java).
> CheckIndex opens and then tests each segment of a index with a separate 
> SegmentReader. The big index with the OutOfMemory problem was optimized, so 
> consisting of one segment with about half a million docs and about 4,000 
> fields. Each byte[] array takes about a half MiB for this index. The 
> CheckIndex funtion created the norm for 4000 fields and the SegmentReader 
> cached them, which is about 2 GiB RAM. So OOMs are not unusal.
> In my opinion, the best would be to use a Weak- or better a SoftReference so 
> norms.bytes gets java.lang.ref.SoftReference<byte[]> and used for caching. 
> With proper synchronization (which is done on the norms cache in 
> SegmentReader) you can do the best with SoftReference, as this reference is 
> garbage collected only when an OOM may happen. If the byte[] array is freed 
> (but it is only freed if no other references exist), a lter call to 
> getNorms() creates a new array. When code is hard referencing the norms 
> array, it will not be freed, so no problem. The same could be done for the 
> other IndexReaders.
> Fields without norm() do not have this problem, as all these fields share a 
> one-time allocated dummy norm array. So the same index without norms enabled 
> for most of the fields checked perfectly.
> I will prepare a patch tomorrow.
> Mike proposed another quick fix for CheckIndex:
> bq. we could do something first specifically for CheckIndex (eg it could 
> simply use the 3-arg non-caching bytes method instead) to prevent OOM errors 
> when using it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-1520) OOM erros with CheckIndex with indexes containg a lot of fields with norms

Reply via email to