Cool. I'll give it a try. Looks like extending FilterIndexReader is the way to go. Or possibly I could cache the compressed form at a lower level getting the best of both worlds. I'll look into both ways, profile the app, and post my results.
----- Original Message ----- From: "Doug Cutting" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, July 27, 2004 8:33 PM Subject: Re: Caching of TermDocs > John Patterson wrote: > > I would like to hold a significant amount of the index in memory but use the > > disk index as a spill over. Obviously the best situation is to hold in > > memory only the information that is likely to be used again soon. It seems > > that caching TermDocs would allow popular search terms to be searched more > > efficiently while the less common terms would need to be read from disk. > > The operating system already caches recent disk i/o. So what you'd save > primarily would be the overhead of parsing the data. However the parsed > form, a sequence of docNo and freq ints, is nearly eight times as large > as its compressed size in the index. So your cache would consume a lot > of memory. > > Whether it this provide much overall speedup depends on the distribution > of common terms in your query traffic. If you have a few terms that are > searched very frequently then it might pay off. In my experience with > general-purpose search engines this is not usually the case: folks seem > to use rarer words in queries than they do in ordinary text. But in > some search applications perhaps the traffic is more skewed. Only some > experiments would tell for sure. > > Doug > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
