Sorry for the delay, I had to take some unplanned leave and I wasn't able 
to get to this while I was out. With some more testing I was able to get 
~10k documents a second but I had to make some code changes.

1: I changed to the transport client in our Java code
2: It seemed as if one client wasn't able to keep up so what I did in the 
code was actually spawn a couple of transport clients, each with it's own 
bulk processor with concurrent set at 32. The part of our code that is 
reading in the messages from Kafka then submits them at random to these 
various thrift clients. Is anyone else having to do this or should a single 
thrift client be able to do this?

I wasn't able to get much more out of it because the CPU usage started to 
get really high but I don't think that's an Elasticsearch thing, I think 
it's because we are doing so many regex tasks.

While hitting around ~10k a second the network output was only about 5mb a 
second so we don't seem to be blocked there.

I did determine that was are basically able to pull from Kafka as fast as 
the messages come in when NOT doing inserts into Elasticsearch so I don't 
think that is the problem.

I plan on doing some testing today where we have multiple consumers running 
so see if we can hit our ~40k inserts per second goal (4 consumers doing 
~10k each).

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/56dd795e-fb98-4059-8ab9-5959c2bc3c52%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to