On Wed, Feb 4, 2009 at 10:41 AM, Mark Miller <markrmil...@gmail.com> wrote:
> Todd Benge wrote: > >> >> The intent is to reduce the amount of memory that is held in cache. As it >> is now, it looks like there is an array of comparators for each index >> reader. Most of the data in the array appears to be the same for each >> cache >> so there is duplication for each type ( string, float). >> >> > Use an array cachekey and override it as not mergeable. > > I suppose in terms of the unuiqes terms array, you could see some > duplication. > > I don't think there should be much duplication though - in the non String > cases, each SegmentIndexReader will only hold the values for itself. The > size of the sub arrays would be the same as the full array. > In the String case, you will have duplicates for the unique terms array, so > if you have a lot, that may cause issues, but the ordinal array will not be > any larger. And the unuiqe terms array shouldnt be terrible - the number of > terms per segment should drop logarithmically. I'm not sure you'll see much > of a difference, and it would only be with String sorts. > > That is, unless you are creating your own separate FieldCaches on > multisegmentreaders - then you would double everything. > > >> Yes - we're runnning about 80G in the indices so there's not enough RAM >> for >> all the data in the fieldcache. >> >> > That is a large index. Can you share how many documents? I don't have the exact number but I think it's 200 - 250 million documents. I'll see if I can get some more realistic numbers and re-post. Thanks for the help. Todd > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >