I have one more follow up question, how can one know if there are more documents or not? This is to avoid one exta last call if possible.
On Thu, Apr 10, 2014 at 3:47 PM, Mohit Anchlia <[email protected]>wrote: > Thanks Adrien and Nikolas it's very helpful. > > > On Thu, Apr 10, 2014 at 3:19 PM, Adrien Grand < > [email protected]> wrote: > >> On Thu, Apr 10, 2014 at 11:13 PM, Nikolas Everett <[email protected]>wrote: >> >>> This one is easy. Elasticsearch/lucene has to keep a min heap of all >>> the documents you find and the score that is from + size big. Technically >>> it is min(from + size, max(rescore_window_size)). Anyway, that means some >>> part of the query has O(n) space and O(n * log(n)) time complexity where n >>> is from + size. That part might be dwarfed by some other action but it is >>> there. And technically in the worst case the time complexity is more like >>> O(hits * log(n)) but thats not likely. >>> >> >> Everything that Nikolas said is correct. I'd like to add that starting >> with Elasticsearch 1.2.0, paging with scroll is going to be more >> efficient[1] since the worst case will be O(hits * log(size)) instead of >> O(hits * log(from + size)). If you are interested in why it is possible, >> the reason is that on each shard, scroll is going to keep track of the >> least document that is part of the hits of the previous page, so that you >> can just ignore documents that compare greater than this document instead >> of adding them to the priority queue. >> >> The issue with realtime is that it creates lots of segments that usually >> get merged very quickly. On the other hand, scroll works by asking the >> shard to keep open the view over the index that was used for the first >> page, until the scroll is closed. This can delay space reclamation and >> force Elasticsearch to keep a significant number of files open (beware of >> going out of file descriptors). >> >> If you have important search traffic, I would recommend not to use scroll >> for every user because of its cost. It is usually a better idea to just >> increase the from parameter and prevent your users from performing deep >> paging since it might kill your cluster. (If you go to any web search >> engine, you'll see that even if they tell us that your query matched >> millions of documents, they only allow you to get hits for a few tens of >> pages.) >> >> [1] https://github.com/elasticsearch/elasticsearch/issues/4940 >> >> -- >> Adrien Grand >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6JwVMTfHr%2BdFbqRvBWJ2%2B2zAAR6g8T9C31-gXpYN4LWQ%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6JwVMTfHr%2BdFbqRvBWJ2%2B2zAAR6g8T9C31-gXpYN4LWQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAOT3TWqnkRqD%2BoAX1W4ThSCE-%3DWtgYPqkvVUgEFXCj8iWJf2JA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
