Hi all, I'm currently working on a project where elasticsearch is our 
backend but have been running into issues with insert rates. Some 
background is our cluster is four physical boxes, each with 32 CPU cores 
and 252 gigs of RAM. Each box runs a data node, a master node and a search 
node. On two other machines that have the same hardware specs we have a 
java app running that pulls our data from Kafka, does some adjusting of the 
data and then inserts it into Elasticsearch. 

In the java app we are using the "node" style client along with the 
BulkProcessor class to handle our inserts. Everything is running on 
Elasticsearch 0.90.5 with Java 1.7.0_45. The issue we are running into is 
we can't seem to be able to get over about 7k inserts per second per java 
app (so 14k total since we have two instances of our java app running). It 
seems around 6500k-7k the Elasticsearch inserts start to lag behind how 
fast we're pulling the data from Kafka. Our initial thoughts were that the 
"data adjusting" stage of our app was causing the latency but we've been 
able to rule that out by adding some metrics around that part of the app. 
Everything is fine until we reach the point where we want to do inserts. My 
question is are there any other users out there pushing ~10k inserts per 
second (that is our goal) using the Java API? If so would you mind sharing 
some of the settings you are using? We've tried adjusting the BulkProcessor 
concurrent count and bulk size but nothing seems to really improve it. One 
thing I've noticed with our monitoring is that sometimes it seems like our 
Elasticsearch client gets backed up or something. We'll see inserts 
chugging along at 6k and then just start dropping and then after a few 
seconds they start coming back up. No GCs or anything happen during this 
time so I'm not sure what would be causing that.

The health of the boxes while we're running looks fine (both on the ES 
nodes as well as where our app lives) and inside of the JVM everything 
seems to be ok as well (no huge GCs or anything). I've searched this list 
and have found people talking about doing 10k inserts per second so we know 
it's totally possible, we just can't seem to get the right setup to get to 
that number. Any suggestions or tips would be greatly appreciated!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/804767e1-480e-49be-8a79-7fbf4f0ce62e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to