Re: Best way to returning hits after search?

Antony Bowesman Tue, 27 Feb 2007 17:38:37 -0800

Doron Cohen wrote:

The collect() method is going to be invoked once for each document that
matches the query (having nonzero score). If the index is very large, that
may turn to be a very large number of calls. Often, search applications
only fetch additional data (doc fields) for only a small subset of the
entire set of documents matching a query - e.g. first page (0-9), second
page (10-19), etc.  But if your application is going to fetch in an
exhaustive manner, and especially for a short field like DB_ID, it
sometimes makes sense to cache in memory the entire field (its values for
all the docs), for the entire life of the index reader/searcher, and use
that cached data. The collect method can then use that cached data.

That's an excellent idea! We cannot easily change our client implementation, sohave to support the exhaustive retrieval for now, although I do limit theabsolute max hits that will be returned. We are hoping to implement paging in alater client version.

I'm not sure I can cache all the GUIDs though. A GUID is 20 bytes and there aretwo that need to be cached. The document count could be up to 100M, though inmost cases <20M. I am keeping a BitSet filter cache for a searcher for eachuser's mail, so I could extend that to cache all the IDs for that user and givethat cache a shortish life and/or limit the total cache available. That wouldreally help.


I'll have a play - thanks for the input.
Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Best way to returning hits after search?

Reply via email to