Calculation looks right. But what's the "Index divisor" that you mentioned?
-- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Mon, Nov 17, 2008 at 5:00 PM, Zhibin Mai <[EMAIL PROTECTED]> wrote: > Aleksander, > > I figured it out that most of heap was consumed by the Term cache. In our > case, the index has 233 millions of terms and 6.4 millions of them were > loaded into the cache when we did the search. I roughly did a calculation > that each term will need how much memory, it is about > 16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes for > String Object for term text + 2 * length(Char[]) for term text. > > In our case, the average length of term text is 25 characters, that means > each term needs at least 122 bytes. The cache for 6.4 millions of terms > needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for cache is > larger than 980MB. We work around the cache issue for Terms by setting index > divisor of the IndexReader to a higher value. Actually, the performance of > search is good even using index divisor as 4. > > Thanks, > > Zhibin > > > > > ________________________________ > From: Aleksander M. Stensby <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Monday, November 17, 2008 2:31:04 AM > Subject: Re: how to estimate how much memory is required to support the > large index search > > One major factor that may result in heap space problems is if you are doing > any form of sorting when searching. Do you have any form of default sort in > your application? Also, the type of field used for sorting is important with > regard to memory consumption. > > This issue has been discussed before on the list. (You can search the > archive for sorting and memory consumption.) > > - Aleksander > > On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > I > > am a beginner on using lucene. We developed an application to > > create and search index using lucene 2.3.1. We would like to know how > > to estimate how much memory is required to support > > the index search given an index. > > > > Recently, > > the size of the index has reached to about 200GB with 197M of documents > > and 223M of terms. Our application starts having intermittent > > "OutOfMemoryError: Java heap space" when we use > > it to search the index. We use JProfiler to get the following memory > allocation when we do one keyword search: > > > > char[] 332MB > > org.apache.lucene.index.TermInfo 194MB > > java.lang.String 146MB > > org.apache.lucene.index.Term 99,823KB > > org.apache.lucene.index.Term 24,956KB > > org.apache.lucene.index.TermInfo[] 24,956KB > > > > byte[] 188MB > > long[] 49,912KB > > > > The memory allocation for the first 6 types of objects does not change > when we change the search criteria. Could you please give me some advice > what major factors will affect the memory allocation > > and how those factors will affect the memory usage precisely on search? > Is it possible to reduce the memory usage on search? > > > > > > Thank you, > > > > > > Zhibin > > > > > > > > > > --Aleksander M. Stensby > Senior software developer > Integrasco A/S > www.integrasco.no > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > >