On Thu, Apr 10, 2014 at 11:13 PM, Nikolas Everett <[email protected]> wrote:
> This one is easy. Elasticsearch/lucene has to keep a min heap of all the > documents you find and the score that is from + size big. Technically it > is min(from + size, max(rescore_window_size)). Anyway, that means some > part of the query has O(n) space and O(n * log(n)) time complexity where n > is from + size. That part might be dwarfed by some other action but it is > there. And technically in the worst case the time complexity is more like > O(hits * log(n)) but thats not likely. > Everything that Nikolas said is correct. I'd like to add that starting with Elasticsearch 1.2.0, paging with scroll is going to be more efficient[1] since the worst case will be O(hits * log(size)) instead of O(hits * log(from + size)). If you are interested in why it is possible, the reason is that on each shard, scroll is going to keep track of the least document that is part of the hits of the previous page, so that you can just ignore documents that compare greater than this document instead of adding them to the priority queue. The issue with realtime is that it creates lots of segments that usually get merged very quickly. On the other hand, scroll works by asking the shard to keep open the view over the index that was used for the first page, until the scroll is closed. This can delay space reclamation and force Elasticsearch to keep a significant number of files open (beware of going out of file descriptors). If you have important search traffic, I would recommend not to use scroll for every user because of its cost. It is usually a better idea to just increase the from parameter and prevent your users from performing deep paging since it might kill your cluster. (If you go to any web search engine, you'll see that even if they tell us that your query matched millions of documents, they only allow you to get hits for a few tens of pages.) [1] https://github.com/elasticsearch/elasticsearch/issues/4940 -- Adrien Grand -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6JwVMTfHr%2BdFbqRvBWJ2%2B2zAAR6g8T9C31-gXpYN4LWQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
