Hi, If it is really the case that every 128th term is loaded into memory. Could you use a relational database or b-tree to index to do the work of indexing of the terms instead?
Even if you create another level of indexing on top of the .tii fle, it is just a hack and would not scale well. I would think a b/b+ tree based approach is the way to go for better memory utilization. Cheers, Jian On Sat, 22 Jan 2005 08:32:50 -0800 (PST), Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > There Kevin, that's what I was referring to, the .tii file. > > Otis > > --- Paul Elschot <[EMAIL PROTECTED]> wrote: > > > On Saturday 22 January 2005 01:39, Kevin A. Burton wrote: > > > Kevin A. Burton wrote: > > > > > > > We have one large index right now... its about 60G ... When I > > open it > > > > the Java VM used 940M of memory. The VM does nothing else > > besides > > > > open this index. > > > > > > After thinking about it I guess 1.5% of memory per index really > > isn't > > > THAT bad. What would be nice if there was a way to do this from > > disk > > > and then use the a buffer (either via the filesystem or in-vm > > memory) to > > > access these variables. > > > > It's even documented. From: > > http://jakarta.apache.org/lucene/docs/fileformats.html : > > > > >The term info index, or .tii file. > > >This contains every IndexIntervalth entry from the .tis file, along > > with its > > >location in the "tis" file. This is designed to be read entirely > > into memory > > >and used to provide random access to the "tis" file. > > > > My guess is that this is what you see happening. > > To see the actuall .tii file, you need the non default file format. > > > > Once searching starts you'll also see that the field norms are > > loaded, > > these take one byte per searched field per document. > > > > > This would be similar to the way the MySQL index cache works... > > > > It would be possible to add another level of indexing to the terms. > > No one has done this yet, so I guess it's prefered to buy RAM > > instead... > > > > Regards, > > Paul Elschot > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]