[jira] Created: (LUCENE-1520) OOM erros with CheckIndex with indexes containg a lot of fields with norms

Uwe Schindler (JIRA) Thu, 15 Jan 2009 14:46:29 -0800

OOM erros with CheckIndex with indexes containg a lot of fields with norms
--------------------------------------------------------------------------


                 Key: LUCENE-1520
                 URL: https://issues.apache.org/jira/browse/LUCENE-1520
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.9
            Reporter: Uwe Schindler


All index readers have a cache of the last used norms (SegmentReader, 
MultiReader, MultiSegmentReader,...). This cache is never cleaned up, so if you 
access norms of a field, the norm's byte[maxdoc()] array is not freed until you 
close/reopen the index.

You can see this problem, if you create an index with many fields with norms (I 
tested with about 4,000 fields) and many documents (half a million). If you 
then call CheckIndex, that calls norms() for each (!) field in the Segment and 
each of this calls creates a new cache entry, you get OutOfMemoryExceptions 
after short time (I tested with the above index: I was not able to do a 
CheckIndex even with "-Xmx 16GB" on 64bit Java).

CheckIndex opens and then tests each segment of a index with a separate 
SegmentReader. The big index with the OutOfMemory problem was optimized, so 
consisting of one segment with about half a million docs and about 4,000 
fields. Each byte[] array takes about a half MiB for this index. The CheckIndex 
funtion created the norm for 4000 fields and the SegmentReader cached them, 
which is about 2 GiB RAM. So OOMs are not unusal.

In my opinion, the best would be to use a Weak- or better a SoftReference so 
norms.bytes gets java.lang.ref.SoftReference<byte[]> and used for caching. With 
proper synchronization (which is done on the norms cache in SegmentReader) you 
can do the best with SoftReference, as this reference is garbage collected only 
when an OOM may happen. If the byte[] array is freed (but it is only freed if 
no other references exist), a lter call to getNorms() creates a new array. When 
code is hard referencing the norms array, it will not be freed, so no problem. 
The same could be done for the other IndexReaders.

Fields without norm() do not have this problem, as all these fields share a 
one-time allocated dummy norm array. So the same index without norms enabled 
for most of the fields checked perfectly.

I will prepare a patch tomorrow.

Mike proposed another quick fix for CheckIndex:
bq. we could do something first specifically for CheckIndex (eg it could simply 
use the 3-arg non-caching bytes method instead) to prevent OOM errors when 
using it.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Created: (LUCENE-1520) OOM erros with CheckIndex with indexes containg a lot of fields with norms

Reply via email to