Get a fixed random sample from all documents

Sebastian Rickelt Fri, 24 Apr 2015 07:03:05 -0700

Hi,

I want to fetch a fixed large number of documents randomly from 
Elasticsearch to compute some statistics (100,000 out of 10 M documents). 
The randomness has to be predictable so that I get the same documents with 
every request.


My problem is that scan and scroll is fast but as I understand the order is 
not predictable. On the other side I could use the 'random_score' function 
with a fixed seed in my query. That would fix the order problem but deep 
pagination is very slow. Has anyone done this before? Any ideas or pointers 
how to do this with Elasticsearch?

Any help appreciated.

Cheers,

Sebastian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e00e363a-5346-48bd-807c-4b221bed7c28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Get a fixed random sample from all documents

Reply via email to