Elasticsearch exposes the total number of hits in the search responses, let's call it T. So if your page size is P, you know that there are `ceil(T / P)` pages.
On Fri, Apr 11, 2014 at 5:48 AM, Mohit Anchlia <[email protected]>wrote: > I have one more follow up question, how can one know if there are more > documents or not? This is to avoid one exta last call if possible. > > > On Thu, Apr 10, 2014 at 3:47 PM, Mohit Anchlia <[email protected]>wrote: > >> Thanks Adrien and Nikolas it's very helpful. >> >> >> On Thu, Apr 10, 2014 at 3:19 PM, Adrien Grand < >> [email protected]> wrote: >> >>> On Thu, Apr 10, 2014 at 11:13 PM, Nikolas Everett <[email protected]>wrote: >>> >>>> This one is easy. Elasticsearch/lucene has to keep a min heap of all >>>> the documents you find and the score that is from + size big. Technically >>>> it is min(from + size, max(rescore_window_size)). Anyway, that means some >>>> part of the query has O(n) space and O(n * log(n)) time complexity where n >>>> is from + size. That part might be dwarfed by some other action but it is >>>> there. And technically in the worst case the time complexity is more like >>>> O(hits * log(n)) but thats not likely. >>>> >>> >>> Everything that Nikolas said is correct. I'd like to add that starting >>> with Elasticsearch 1.2.0, paging with scroll is going to be more >>> efficient[1] since the worst case will be O(hits * log(size)) instead of >>> O(hits * log(from + size)). If you are interested in why it is possible, >>> the reason is that on each shard, scroll is going to keep track of the >>> least document that is part of the hits of the previous page, so that you >>> can just ignore documents that compare greater than this document instead >>> of adding them to the priority queue. >>> >>> The issue with realtime is that it creates lots of segments that usually >>> get merged very quickly. On the other hand, scroll works by asking the >>> shard to keep open the view over the index that was used for the first >>> page, until the scroll is closed. This can delay space reclamation and >>> force Elasticsearch to keep a significant number of files open (beware of >>> going out of file descriptors). >>> >>> If you have important search traffic, I would recommend not to use >>> scroll for every user because of its cost. It is usually a better idea to >>> just increase the from parameter and prevent your users from performing >>> deep paging since it might kill your cluster. (If you go to any web search >>> engine, you'll see that even if they tell us that your query matched >>> millions of documents, they only allow you to get hits for a few tens of >>> pages.) >>> >>> [1] https://github.com/elasticsearch/elasticsearch/issues/4940 >>> >>> -- >>> Adrien Grand >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6JwVMTfHr%2BdFbqRvBWJ2%2B2zAAR6g8T9C31-gXpYN4LWQ%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6JwVMTfHr%2BdFbqRvBWJ2%2B2zAAR6g8T9C31-gXpYN4LWQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqnkRqD%2BoAX1W4ThSCE-%3DWtgYPqkvVUgEFXCj8iWJf2JA%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqnkRqD%2BoAX1W4ThSCE-%3DWtgYPqkvVUgEFXCj8iWJf2JA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Adrien Grand -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5ghh5iNVOqDoDtzR7E4922JqQOBis9FYG7QZsGo5%2B8Yw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
