Aleksander,

I figured it out that most of heap was consumed by the Term cache. In our case, 
the index has 233 millions of terms and 6.4 millions of them were loaded into 
the cache when we did the search. I roughly did a calculation that each term 
will need how much memory, it is about
16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes for String 
Object for term text + 2 * length(Char[]) for term text.

In our case, the average length of term text is 25 characters, that means each 
term needs at least 122 bytes. The cache for 6.4 millions of terms needs 6.4 * 
122 = 780MB. Plus 200MB for caching norm, the RAM for cache is larger than 
980MB. We work around the cache issue for Terms by setting index divisor of the 
IndexReader to a higher value. Actually, the performance of search is good even 
using index divisor as 4.

Thanks,

Zhibin




________________________________
From: Aleksander M. Stensby <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, November 17, 2008 2:31:04 AM
Subject: Re: how to estimate how much memory is required to support the large 
index search

One major factor that may result in heap space problems is if you are doing any 
form of sorting when searching. Do you have any form of default sort in your 
application? Also, the type of field used for sorting is important with regard 
to memory consumption.

This issue has been discussed before on the list. (You can search the archive 
for sorting and memory consumption.)

- Aleksander

On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> I
> am a beginner on using lucene. We developed an application to
> create and search index using lucene 2.3.1. We would like to know how
> to estimate how much memory is required to support
> the index search given an index.
> 
> Recently,
> the size of the index has reached to about 200GB with 197M of documents
> and 223M of terms. Our application starts having intermittent
> "OutOfMemoryError: Java heap space" when we use
> it to search the index. We use JProfiler to get the following memory 
> allocation when we do one keyword search:
> 
> char[]                                                        332MB
> org.apache.lucene.index.TermInfo            194MB
> java.lang.String                                        146MB
> org.apache.lucene.index.Term                99,823KB
> org.apache.lucene.index.Term                24,956KB
> org.apache.lucene.index.TermInfo[]        24,956KB
> 
> byte[]                                                    188MB
> long[]                                                    49,912KB
> 
> The memory allocation for the first 6 types of objects does not change when 
> we change the search criteria. Could you please give me some advice what 
> major factors will affect the memory allocation
> and how those factors will affect the memory usage precisely on search? Is it 
> possible to reduce the memory usage on search?
> 
> 
> Thank you,
> 
> 
> Zhibin
> 
> 
> 



--Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


      

Reply via email to