On Sun, Jun 28, 2009 at 9:08 PM, Nigel<nigelspl...@gmail.com> wrote: >> Unfortunately the TermInfos must still be hit to look up the >> freq/proxOffset in the postings files. > > But for that data you only have to hit the TermInfos for the terms you're > searching, correct? So, assuming that there are vastly more terms in the > index than you ever actually use in a query, we could rely on the LRU cache > to keep the queried TermInfos around, rather than loading all of them > up-front. This was a hypothesis based on some tracing through the code but > not a lot of knowledge of Lucene internals, so please steer me back to > reality if necessary. (-:
Right, it's only the terms in your query that need to be looked up. There's already an LRU cache for the Term -> TermInfos lookup, in TermInfosReader (hard wired to size 1024). It was created as a "short term" cache so that queries that look up the same term twice (eg once for idf and once to get freq/prox postings position), would result in only one real lookup. You could try privately experimenting w/ that value to see if you can get a reasonable it rate? Lucene only loads the "indexed" terms (= every 128th) into RAM, by default. Mike Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org