Thanks Adrien and Nikolas it's very helpful.

On Thu, Apr 10, 2014 at 3:19 PM, Adrien Grand <
[email protected]> wrote:

> On Thu, Apr 10, 2014 at 11:13 PM, Nikolas Everett <[email protected]>wrote:
>
>> This one is easy.  Elasticsearch/lucene has to keep a min heap of all the
>> documents you find and the score that is from + size big.  Technically it
>> is min(from + size, max(rescore_window_size)).  Anyway, that means some
>> part of the query has O(n) space and O(n * log(n)) time complexity where n
>> is from + size.  That part might be dwarfed by some other action but it is
>> there.  And technically in the worst case the time complexity is more like
>> O(hits * log(n)) but thats not likely.
>>
>
> Everything that Nikolas said is correct. I'd like to add that starting
> with Elasticsearch 1.2.0, paging with scroll is going to be more
> efficient[1] since the worst case will be O(hits * log(size)) instead of
> O(hits * log(from + size)). If you are interested in why it is possible,
> the reason is that on each shard, scroll is going to keep track of the
> least document that is part of the hits of the previous page, so that you
> can just ignore documents that compare greater than this document instead
> of adding them to the priority queue.
>
> The issue with realtime is that it creates lots of segments that usually
> get merged very quickly. On the other hand, scroll works by asking the
> shard to keep open the view over the index that was used for the first
> page, until the scroll is closed. This can delay space reclamation and
> force Elasticsearch to keep a significant number of files open (beware of
> going out of file descriptors).
>
> If you have important search traffic, I would recommend not to use scroll
> for every user because of its cost. It is usually a better idea to just
> increase the from parameter and prevent your users from performing deep
> paging since it might kill your cluster. (If you go to any web search
> engine, you'll see that even if they tell us that your query matched
> millions of documents, they only allow you to get hits for a few tens of
> pages.)
>
> [1] https://github.com/elasticsearch/elasticsearch/issues/4940
>
> --
> Adrien Grand
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6JwVMTfHr%2BdFbqRvBWJ2%2B2zAAR6g8T9C31-gXpYN4LWQ%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6JwVMTfHr%2BdFbqRvBWJ2%2B2zAAR6g8T9C31-gXpYN4LWQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAOT3TWo6Sm9vxc-wsE6nyE3nE8w9Ke6eE3EQbeXtihYUbOskHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to