I was thinking of the aggressive version with an index-time solution, although I don't know the Lucene architecture for distributed indexing and searching well enough to formulate the idea precisely. Conceptually, I'd like each server that owns a slice of the index in a distributed environment to have the complete docFreq data, i.e. to have docFreq's that represent the collection as a whole, not just its index slice. If this was achieved at index-time, then the current implementation would work at query time. I.e., MultiSearch could send the queries out to the remote Searcher's and these Searcher's could consult their local indexes for the correct docFreq's to use.
This is different than what I described. I described keeping a docFreq cache at the central dispatch node, while you describe replicating that cache on every search node. I don't see the advantage in this replication. It is both more efficient to maintain a single cache, and faster to search, since fewer dictionary lookups are involved.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]