On Nov 8, 2012, at 11:30 AM, Robert Muir <[email protected]> wrote: Thanks everybody for response, and much more of the same for the great project
> Why are you retrieving thousands of stored fields? I do not think it is all that rare that people actually do something with information but display summaries? Clustering in solr does exactly that, online record linkage follows exactly the same pattern. A pattern "fetch thousands of candidates and run some heavy processing on them" is surely not a typical "web search engine" usage, but philosophically, a model: a) search data b) do something with it c) deliver is not that strange? You say, b) should not be done using stored fields, ok I trust you, but going to database/nosql/anything is even slower. What approach would you recommend? "the probability of two documents of the same results page being in the same chunk is very low." Adrian, Robert, this is 100% correct, no objection there. In this particular case we are using locality of reference heavily. We simply sort the data and reindex from time to time. You have to be lucky to be able to sort the documents, but we do not use lucene for big chunks of text, rather for almost fully structured data and we know how to sort this data to preserve locality of reference… Also a bit unusual, but I do not think all that rare scenario. Sorting data (where possible) was a great optimisation tip for many applications, even before compression. "really you should roll your own codec for this and specialise." Yes, already started thinking about it, but we will first try to play with chunk size to see if we can achieve the goal without own codec … --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
