Sorry for the delay, I had to take some unplanned leave and I wasn't able to get to this while I was out. With some more testing I was able to get ~10k documents a second but I had to make some code changes.
1: I changed to the transport client in our Java code 2: It seemed as if one client wasn't able to keep up so what I did in the code was actually spawn a couple of transport clients, each with it's own bulk processor with concurrent set at 32. The part of our code that is reading in the messages from Kafka then submits them at random to these various thrift clients. Is anyone else having to do this or should a single thrift client be able to do this? I wasn't able to get much more out of it because the CPU usage started to get really high but I don't think that's an Elasticsearch thing, I think it's because we are doing so many regex tasks. While hitting around ~10k a second the network output was only about 5mb a second so we don't seem to be blocked there. I did determine that was are basically able to pull from Kafka as fast as the messages come in when NOT doing inserts into Elasticsearch so I don't think that is the problem. I plan on doing some testing today where we have multiple consumers running so see if we can hit our ~40k inserts per second goal (4 consumers doing ~10k each). -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/56dd795e-fb98-4059-8ab9-5959c2bc3c52%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
