Doron Cohen wrote:
The collect() method is going to be invoked once for each document that matches the query (having nonzero score). If the index is very large, that may turn to be a very large number of calls. Often, search applications only fetch additional data (doc fields) for only a small subset of the entire set of documents matching a query - e.g. first page (0-9), second page (10-19), etc. But if your application is going to fetch in an exhaustive manner, and especially for a short field like DB_ID, it sometimes makes sense to cache in memory the entire field (its values for all the docs), for the entire life of the index reader/searcher, and use that cached data. The collect method can then use that cached data.
That's an excellent idea! We cannot easily change our client implementation, so have to support the exhaustive retrieval for now, although I do limit the absolute max hits that will be returned. We are hoping to implement paging in a later client version.
I'm not sure I can cache all the GUIDs though. A GUID is 20 bytes and there are two that need to be cached. The document count could be up to 100M, though in most cases <20M. I am keeping a BitSet filter cache for a searcher for each user's mail, so I could extend that to cache all the IDs for that user and give that cache a shortish life and/or limit the total cache available. That would really help.
I'll have a play - thanks for the input. Antony --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]