Doug Cutting wrote: >>From: Scott Ganyo [mailto:[EMAIL PROTECTED]] >> >>But this: >> >>Document[] getDocs(int[] i) throws IOException; >> >>still retrieves full documents from the remote index. >> > >In my thinking, this would only be called for documents that are explicitly >requested with Hits.doc(). I was not thinking that distributed search would >support the "low-level" interface, but just the Hits interface. For each >search, two calls would be made per remote index, one to get query term >statistics, and one to get the top-scoring document numbers and scores. >These can be merged, and then only the globally top-scoring document objects >need be retrieved, as they are displayed. > I think Scott's point was that retrieving documents is still too much work and perhaps only a few fields need be retrieved. For example, if one wanted to present a search results page with titles and summaries that's all one would need, whereas documents might also contain the full text of the document or other stored fields for other types of processing.
Another point is that some hit collectors choose to retrieve documents during scoring, however expensive that may be, in order to do some custom scoring or sorting or whatever. In this case, it would also help if such collectors could be "shipped" over to where the index resides and do their job there, so that at least they don't have to move the documents acorss the wire. Dmitry -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
