Scan/scroll is the best option to extract a huge amount of data. Never use size:10000000 or from:10000000.
It's not realtime because you basically scroll over a given set of segments and all new changes that will come in new segments won't be taken into account during the scroll. Which is good because you won't get inconsistent results. About size, I'd would try and test. It depends on your docs size I believe. Try with 10000 and see how it goes when you increase it. You will may be discover that getting 10*10000 docs is the same as 1*100000. :) Best David > Le 10 déc. 2014 à 19:09, Ron Sher <[email protected]> a écrit : > > Hi, > > I was wondering about best practices to to get all data according to some > filters. > The options as I see them are: > Use a very big size that will return all accounts, i.e. use some value like > 1m to make sure I get everything back (even if I need just a few hundreds or > tens of documents). This is the quickest way, development wise. > Use paging - using size and from. This requires looping over the result and > the performance gets worse as we advance to later pages. Also, we need to use > preference if we want to get consistent results over the pages. Also, it's > not clear what's the recommended size for each page. > Use scan/scroll - this gives consistent paging but also has several > drawbacks: If I use search_type=scan then it can't be sorted; using > scan/scroll is (maybe) less performant than paging (the documentation says > it's not for realtime use); again not clear which size is recommended. > So you see - many options and not clear which path to take. > > What do you think? > > Thanks, > Ron > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/764a37c5-1fec-48c4-9c66-7835d8141713%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F1FB312D-0FEA-4D59-88EA-3E16C457DAE0%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
