Thanks Adrien and Nikolas it's very helpful. On Thu, Apr 10, 2014 at 3:19 PM, Adrien Grand < [email protected]> wrote:
> On Thu, Apr 10, 2014 at 11:13 PM, Nikolas Everett <[email protected]>wrote: > >> This one is easy. Elasticsearch/lucene has to keep a min heap of all the >> documents you find and the score that is from + size big. Technically it >> is min(from + size, max(rescore_window_size)). Anyway, that means some >> part of the query has O(n) space and O(n * log(n)) time complexity where n >> is from + size. That part might be dwarfed by some other action but it is >> there. And technically in the worst case the time complexity is more like >> O(hits * log(n)) but thats not likely. >> > > Everything that Nikolas said is correct. I'd like to add that starting > with Elasticsearch 1.2.0, paging with scroll is going to be more > efficient[1] since the worst case will be O(hits * log(size)) instead of > O(hits * log(from + size)). If you are interested in why it is possible, > the reason is that on each shard, scroll is going to keep track of the > least document that is part of the hits of the previous page, so that you > can just ignore documents that compare greater than this document instead > of adding them to the priority queue. > > The issue with realtime is that it creates lots of segments that usually > get merged very quickly. On the other hand, scroll works by asking the > shard to keep open the view over the index that was used for the first > page, until the scroll is closed. This can delay space reclamation and > force Elasticsearch to keep a significant number of files open (beware of > going out of file descriptors). > > If you have important search traffic, I would recommend not to use scroll > for every user because of its cost. It is usually a better idea to just > increase the from parameter and prevent your users from performing deep > paging since it might kill your cluster. (If you go to any web search > engine, you'll see that even if they tell us that your query matched > millions of documents, they only allow you to get hits for a few tens of > pages.) > > [1] https://github.com/elasticsearch/elasticsearch/issues/4940 > > -- > Adrien Grand > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6JwVMTfHr%2BdFbqRvBWJ2%2B2zAAR6g8T9C31-gXpYN4LWQ%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6JwVMTfHr%2BdFbqRvBWJ2%2B2zAAR6g8T9C31-gXpYN4LWQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWo6Sm9vxc-wsE6nyE3nE8w9Ke6eE3EQbeXtihYUbOskHg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
