Sounds good.
If you are using Java, you could also look at the river code.
Note that you should use BulkProcessor class which is super handy.

BTW I said 10000/s but not for tweets. I have less fields (20) than Twitter 
(>100).
With more fields, I guess it would take more time. Though with better machines, 
it could work. I'd say that you need to test on the production cluster.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

> Le 15 janv. 2015 à 15:40, Chinch Pokli <[email protected]> a écrit :
> 
> Awesome! Great to know that. So as a conclusion the steps will be:
> 1) Stream tweets from twitter
> 2) Use the bulk API to make batches of 1000 (or more) tweets
> 3) Once the batch size is reached, spawn a new thread which will index the 
> data into ES, meanwhile my original thread will continue streaming tweets
> 
> Do these steps sound alright to you or did I miss something?
> 
>> On Thursday, January 15, 2015 at 7:58:19 PM UTC+5:30, David Pilato wrote:
>> I can index on my laptop 10000-12000 docs per second. SSD drives of course.
>> 
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>> 
>>> Le 15 janv. 2015 à 13:43, Chinch Pokli <[email protected]> a écrit :
>>> 
>>> No, so the whole point was that, will elasticsearch be able to index say 
>>> 10,000 documents per second? If yes, I can simply hook up my twitter code 
>>> to es. If not, I would need to think of how to make that happen.
>>> Typically I've seen es indexes just around 30 docs per second which is 
>>> pretty low.
>>> 
>>> I am hoping Redis/ Kafka/ Logstash/ etc. might help elasticsearch to get 
>>> some breathing room and enable it to index up to 10K docs per second.
>>> 
>>>> On Thursday, January 15, 2015 at 10:47:31 AM UTC+5:30, David Pilato wrote:
>>>> You have a Twitter input so you can extract content from Twitter and send 
>>>> to elasticsearch. No need to have Redis here. 
>>>> 
>>>> --
>>>> David ;-)
>>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>>> 
>>>>> Le 15 janv. 2015 à 00:02, Chinch Pokli <[email protected]> a écrit :
>>>>> 
>>>>> Thanks. I'll have a look at the raw option.
>>>>> Regarding logstash, I don't fully understand it's utility. It says that 
>>>>> it can take messages from a Redis server. But if I have to set up Redis, 
>>>>> I could simply use the Redis river to index into Elasticsearch. Is there 
>>>>> any additional benefit that Logstash would give me?
>>>>> 
>>>>>> On Thursday, January 15, 2015 at 4:06:12 AM UTC+5:30, David Pilato wrote:
>>>>>> You should look at raw option or better look at Logstash.
>>>>>> 
>>>>>> My 2 cents.
>>>>>> 
>>>>>> David
>>>>>> 
>>>>>>> Le 14 janv. 2015 à 23:29, Chinch Pokli <[email protected]> a écrit :
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I am using elasticsearch to index twitter stream. Until recently I was 
>>>>>>> using the official river which was working great but realized that it 
>>>>>>> throwing out much of the data (e.g. it is not storing number of 
>>>>>>> followers etc. data).
>>>>>>> 
>>>>>>> Is there a way to make the river to store all the data? If not, I am 
>>>>>>> fine with writing a streaming code which will stream and index. But 
>>>>>>> have a concern. How many documents can elasticsearch index per second? 
>>>>>>> I might eventually need to index almost 10,000 documents (each document 
>>>>>>> = 2 KB) per second (current requirement is of 100 documents per 
>>>>>>> second). Is this even feasible? If yes, do I need to make any special 
>>>>>>> modifications?
>>>>>>> 
>>>>>>> Thanks-in-advance!!
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "elasticsearch" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>>> an email to [email protected].
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/elasticsearch/da547692-903b-4793-a77e-fd5f0b5a01b7%40googlegroups.com.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to [email protected].
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/elasticsearch/d89e6057-ab58-49ef-a553-c5bd5265c172%40googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/a5c75aed-e290-4152-9f8d-160510f3ecfa%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>  
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/11bf4f30-d7f6-41ac-886a-c5281dac31bd%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/BFC7C54C-3118-4C00-AD0A-76950F51AD11%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Reply via email to