Sounds good. If you are using Java, you could also look at the river code. Note that you should use BulkProcessor class which is super handy.
BTW I said 10000/s but not for tweets. I have less fields (20) than Twitter (>100). With more fields, I guess it would take more time. Though with better machines, it could work. I'd say that you need to test on the production cluster. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs > Le 15 janv. 2015 à 15:40, Chinch Pokli <[email protected]> a écrit : > > Awesome! Great to know that. So as a conclusion the steps will be: > 1) Stream tweets from twitter > 2) Use the bulk API to make batches of 1000 (or more) tweets > 3) Once the batch size is reached, spawn a new thread which will index the > data into ES, meanwhile my original thread will continue streaming tweets > > Do these steps sound alright to you or did I miss something? > >> On Thursday, January 15, 2015 at 7:58:19 PM UTC+5:30, David Pilato wrote: >> I can index on my laptop 10000-12000 docs per second. SSD drives of course. >> >> -- >> David ;-) >> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs >> >>> Le 15 janv. 2015 à 13:43, Chinch Pokli <[email protected]> a écrit : >>> >>> No, so the whole point was that, will elasticsearch be able to index say >>> 10,000 documents per second? If yes, I can simply hook up my twitter code >>> to es. If not, I would need to think of how to make that happen. >>> Typically I've seen es indexes just around 30 docs per second which is >>> pretty low. >>> >>> I am hoping Redis/ Kafka/ Logstash/ etc. might help elasticsearch to get >>> some breathing room and enable it to index up to 10K docs per second. >>> >>>> On Thursday, January 15, 2015 at 10:47:31 AM UTC+5:30, David Pilato wrote: >>>> You have a Twitter input so you can extract content from Twitter and send >>>> to elasticsearch. No need to have Redis here. >>>> >>>> -- >>>> David ;-) >>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs >>>> >>>>> Le 15 janv. 2015 à 00:02, Chinch Pokli <[email protected]> a écrit : >>>>> >>>>> Thanks. I'll have a look at the raw option. >>>>> Regarding logstash, I don't fully understand it's utility. It says that >>>>> it can take messages from a Redis server. But if I have to set up Redis, >>>>> I could simply use the Redis river to index into Elasticsearch. Is there >>>>> any additional benefit that Logstash would give me? >>>>> >>>>>> On Thursday, January 15, 2015 at 4:06:12 AM UTC+5:30, David Pilato wrote: >>>>>> You should look at raw option or better look at Logstash. >>>>>> >>>>>> My 2 cents. >>>>>> >>>>>> David >>>>>> >>>>>>> Le 14 janv. 2015 à 23:29, Chinch Pokli <[email protected]> a écrit : >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am using elasticsearch to index twitter stream. Until recently I was >>>>>>> using the official river which was working great but realized that it >>>>>>> throwing out much of the data (e.g. it is not storing number of >>>>>>> followers etc. data). >>>>>>> >>>>>>> Is there a way to make the river to store all the data? If not, I am >>>>>>> fine with writing a streaming code which will stream and index. But >>>>>>> have a concern. How many documents can elasticsearch index per second? >>>>>>> I might eventually need to index almost 10,000 documents (each document >>>>>>> = 2 KB) per second (current requirement is of 100 documents per >>>>>>> second). Is this even feasible? If yes, do I need to make any special >>>>>>> modifications? >>>>>>> >>>>>>> Thanks-in-advance!! >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elasticsearch" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>> an email to [email protected]. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/11bf4f30-d7f6-41ac-886a-c5281dac31bd%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/BFC7C54C-3118-4C00-AD0A-76950F51AD11%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
