The term, terminfo, indexreader internals stuff is prob on the low end compared to the size of your field caches (needed for sorting). If you are sorting by String I think the space needed is 32 bits x number of docs + an array to hold all of the unique terms. So checking 300 million docs (I know you are actually breaking it up smaller than that, but for example) and ignoring things like String chars being variable byte lengths and storing the length, etc and randomly picking 50000 unique terms at 6 chars per:

32 bits x 300000000 + 50000 x 6 x 16 bits to MB = 1 144.98138 megabytes

Thats per field your sorting on. If you are sorting on an int field it should be closer to 32 bits x num docs - shorts, 32 bits x num docs, etc.

So you have those field caches, plus the IndexReader terminfo, term stuff, plus whatever RAM your app needs beyond Lucene. 4 gig might just not *quite* cut it is my guess.

Todd Benge wrote:
There's usually only a couple sort fields and a bunch of terms in the
various indices.  The terms are user entered on various media so the
number of terms is very large.

Thanks for the help.

Todd



On 10/29/08, Todd Benge <[EMAIL PROTECTED]> wrote:
Hi,

I'm the lead engineer for search on a large website using lucene for search.

We're indexing about 300M documents in ~ 100 indices.  The indices add
 up to ~ 60G.

The indices are sorted into 4 different Multisearcher with the largest
handling ~50G.

The code is basically like the following:

private static MultiSearcher searcher;

public void init(File files) {

     IndexSearcer [] searchers = new IndexSearcher[files.length] ();
     int i = 0;
     for ( File file: files ) {
          searchers[i++] = new IndexSearcher(FSDirectory.getDirectory(file);
     }

searcher = new MultiSearcher(searchers);
}

public Searcher getSearcher() {
   return searcher;
}

We're seeing a high cache rate with Term & TermInfo in Lucene 2.4.
Performance is good but servers are consistently hanging with
OutOfMemory errors.

We're allocating 4G in the heap to each server.

Is there any way to control the amount of memory Lucene consume for
caching?  Any other suggestions on fixing the memory errors?

Thanks,

Todd




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to