On Sun, Jun 28, 2009 at 9:08 PM, Nigel<nigelspl...@gmail.com> wrote:
>> Unfortunately the TermInfos must still be hit to look up the
>> freq/proxOffset in the postings files.
>
> But for that data you only have to hit the TermInfos for the terms you're
> searching, correct?  So, assuming that there are vastly more terms in the
> index than you ever actually use in a query, we could rely on the LRU cache
> to keep the queried TermInfos around, rather than loading all of them
> up-front.  This was a hypothesis based on some tracing through the code but
> not a lot of knowledge of Lucene internals, so please steer me back to
> reality if necessary.  (-:

Right, it's only the terms in your query that need to be looked up.

There's already an LRU cache for the Term -> TermInfos lookup, in
TermInfosReader (hard wired to size 1024).  It was created as a "short
term" cache so that queries that look up the same term twice (eg once
for idf and once to get freq/prox postings position), would result in
only one real lookup.  You could try privately experimenting w/ that
value to see if you can get a reasonable it rate?

Lucene only loads the "indexed" terms (= every 128th) into RAM, by default.

Mike

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to