Hi guys,
Just wondering what is the most efficient way of executing a query that
takes time(parent/child documents) and returns large amount of entries, and
store the result in randomly evenly divided block to HDFS? e.g, the query
will return 100million records and I want every random 1million stored in a
different location(file/folder) on HDFS.

I assume I could execute the query with scroll, and then whenever I
received the 1 million records back, I then spawn anther thread to commit
it to HDFS? Is there a way to run the query distributed way and have 100
threads query ES at the same time and each getting a random 1million
back(without duplicate)? will ES hadoop help in this case?

Appreciate your input!
Chen

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACim9Rm64uHE9EQ35r_mJr9VhiEbDfD-70vS1uQHSG6UXM7ZDQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to