Re: Bulk throughput issues

tdjb Tue, 31 Dec 2013 07:35:49 -0800

Sorry for the delay, I had to take some unplanned leave and I wasn't able 
to get to this while I was out. With some more testing I was able to get 
~10k documents a second but I had to make some code changes.

1: I changed to the transport client in our Java code
2: It seemed as if one client wasn't able to keep up so what I did in the
code was actually spawn a couple of transport clients, each with it's own
bulk processor with concurrent set at 32. The part of our code that is
reading in the messages from Kafka then submits them at random to these
various thrift clients. Is anyone else having to do this or should a single
thrift client be able to do this?

I wasn't able to get much more out of it because the CPU usage started to
get really high but I don't think that's an Elasticsearch thing, I think
it's because we are doing so many regex tasks.

While hitting around ~10k a second the network output was only about 5mb a
second so we don't seem to be blocked there.

I did determine that was are basically able to pull from Kafka as fast as
the messages come in when NOT doing inserts into Elasticsearch so I don't
think that is the problem.

I plan on doing some testing today where we have multiple consumers running
so see if we can hit our ~40k inserts per second goal (4 consumers doing
~10k each).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/56dd795e-fb98-4059-8ab9-5959c2bc3c52%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Bulk throughput issues

Reply via email to