Search consumes O(offset + size) memory and O(ln(offset +
size)*(offset+size) CPU. Scan scroll has higher overhead but is O(size) the
whole time. I don't know the break even point.

The other thing is that scroll provides a consistent snapshot. That means
it consumes resources you shouldn't let end users expose but it won't miss
results or have repeats like scrolling with increasing offset.

You can certainly do large fetches with big size but its less stable in
general.

Finally, scan/scroll has always been pretty quick for me. I usually use a
batch size in the thousands.

Nik
On Dec 14, 2014 4:13 AM, "David Pilato" <[email protected]> wrote:

> Implication is the memory needed to be allocated on each shard.
>
>
> David
>
> Le 14 déc. 2014 à 05:46, Ron Sher <[email protected]> a écrit :
>
> Again, why not use a very large count size? What are the implications of
> using a very large count?
> Regarding performance - it seems doing 1 request with a very large count
> performs better than using scan scroll (with count of 100 using 32 shards)
>
> On Wednesday, December 10, 2014 10:53:50 PM UTC+2, David Pilato wrote:
>>
>> No I did not say that. Or I did not mean that. Sorry if it was unclear.
>> I said: don’t use large sizes:
>>
>> Never use size:10000000 or from:10000000.
>>>
>>
>> You should read this: http://www.elasticsearch.org/guide/en/
>> elasticsearch/reference/current/search-request-scroll.html#scroll-scan
>>
>> --
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com
>> <http://Elasticsearch.com>*
>> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr
>> <https://twitter.com/elasticsearchfr> | @scrutmydocs
>> <https://twitter.com/scrutmydocs>
>>
>>
>>
>> Le 10 déc. 2014 à 21:16, Ron Sher <[email protected]> a écrit :
>>
>> So you're saying there's no impact on elasticsearch if I issue a large
>> size?
>> If that's the case then why shouldn't I just call size of 1M if I want to
>> make sure I get everything?
>>
>> On Wednesday, December 10, 2014 8:22:47 PM UTC+2, David Pilato wrote:
>>>
>>> Scan/scroll is the best option to extract a huge amount of data.
>>> Never use size:10000000 or from:10000000.
>>>
>>> It's not realtime because you basically scroll over a given set of
>>> segments and all new changes that will come in new segments won't be taken
>>> into account during the scroll.
>>> Which is good because you won't get inconsistent results.
>>>
>>> About size, I'd would try and test. It depends on your docs size I
>>> believe.
>>> Try with 10000 and see how it goes when you increase it. You will may be
>>> discover that getting 10*10000 docs is the same as 1*100000. :)
>>>
>>> Best
>>>
>>> David
>>>
>>> Le 10 déc. 2014 à 19:09, Ron Sher <[email protected]> a écrit :
>>>
>>> Hi,
>>>
>>> I was wondering about best practices to to get all data according to
>>> some filters.
>>> The options as I see them are:
>>>
>>>    - Use a very big size that will return all accounts, i.e. use some
>>>    value like 1m to make sure I get everything back (even if I need just a 
>>> few
>>>    hundreds or tens of documents). This is the quickest way, development 
>>> wise.
>>>    - Use paging - using size and from. This requires looping over the
>>>    result and the performance gets worse as we advance to later pages. Also,
>>>    we need to use preference if we want to get consistent results over the
>>>    pages. Also, it's not clear what's the recommended size for each page.
>>>    - Use scan/scroll - this gives consistent paging but also has
>>>    several drawbacks: If I use search_type=scan then it can't be sorted; 
>>> using
>>>    scan/scroll is (maybe) less performant than paging (the documentation 
>>> says
>>>    it's not for realtime use); again not clear which size is recommended.
>>>
>>> So you see - many options and not clear which path to take.
>>>
>>> What do you think?
>>>
>>> Thanks,
>>> Ron
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/764a37c5-1fec-48c4-9c66-7835d8141713%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/764a37c5-1fec-48c4-9c66-7835d8141713%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/838020dc-d2ea-423d-9606-778d807b1a0d%
>> 40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/838020dc-d2ea-423d-9606-778d807b1a0d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ac0841ac-4150-435c-a3da-afbf2a4b06a6%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/ac0841ac-4150-435c-a3da-afbf2a4b06a6%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7717B0E2-E971-4653-A0A7-BA66EC3EAE9F%40pilato.fr
> <https://groups.google.com/d/msgid/elasticsearch/7717B0E2-E971-4653-A0A7-BA66EC3EAE9F%40pilato.fr?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1ULXMG-f_dF_9HVDoGjU724cCqdPk5zGLz12iWYKdhvA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to