Hi guys, Just wondering what is the most efficient way of executing a query that takes time(parent/child documents) and returns large amount of entries, and store the result in randomly evenly divided block to HDFS? e.g, the query will return 100million records and I want every random 1million stored in a different location(file/folder) on HDFS.
I assume I could execute the query with scroll, and then whenever I received the 1 million records back, I then spawn anther thread to commit it to HDFS? Is there a way to run the query distributed way and have 100 threads query ES at the same time and each getting a random 1million back(without duplicate)? will ES hadoop help in this case? Appreciate your input! Chen -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACim9Rm64uHE9EQ35r_mJr9VhiEbDfD-70vS1uQHSG6UXM7ZDQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
