On Mon, Jun 29, 2009 at 9:33 AM, Nigel<nigelspl...@gmail.com> wrote: > Ah, I was confused by the index divisor being 1 by default: I thought it > meant that all terms were being loaded. I see now in SegmentTermEnum that > the every-128th behavior is implemented at a lower level. > > But I'm even more confused about why we have so many terms in memory. A > heap dump shows over 270 million TermInfos, so if that's only 128th of the > total then we REALLY have a lot of terms. (-: We do have a lot of docs > (about 250 million), and we do have a couple unique per-document values, but > even so I can't see how we could get to 270 million x 128 terms. (The heap > dump numbers are stable across the index close-and-reopen cycle, so I don't > think we're leaking.)
You could use CheckIndex to see how many terms are in your index. If you do the heap dump after opening a fresh reader and not running any searches yet, you see 270 million TermInfos? Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org