Re: Caching of TermDocs

Doug Cutting Tue, 27 Jul 2004 06:33:49 -0700

John Patterson wrote:

I would like to hold a significant amount of the index in memory but use the
disk index as a spill over.  Obviously the best situation is to hold in
memory only the information that is likely to be used again soon.  It seems
that caching TermDocs would allow popular search terms to be searched more
efficiently while the less common terms would need to be read from disk.

The operating system already caches recent disk i/o. So what you'd save primarily would be the overhead of parsing the data. However the parsed form, a sequence of docNo and freq ints, is nearly eight times as large as its compressed size in the index. So your cache would consume a lot of memory.

Whether it this provide much overall speedup depends on the distribution of common terms in your query traffic. If you have a few terms that are searched very frequently then it might pay off. In my experience with general-purpose search engines this is not usually the case: folks seem to use rarer words in queries than they do in ordinary text. But in some search applications perhaps the traffic is more skewed. Only some experiments would tell for sure.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Caching of TermDocs

Reply via email to