Hi all, I'm currently working on a project where elasticsearch is our backend but have been running into issues with insert rates. Some background is our cluster is four physical boxes, each with 32 CPU cores and 252 gigs of RAM. Each box runs a data node, a master node and a search node. On two other machines that have the same hardware specs we have a java app running that pulls our data from Kafka, does some adjusting of the data and then inserts it into Elasticsearch.
In the java app we are using the "node" style client along with the BulkProcessor class to handle our inserts. Everything is running on Elasticsearch 0.90.5 with Java 1.7.0_45. The issue we are running into is we can't seem to be able to get over about 7k inserts per second per java app (so 14k total since we have two instances of our java app running). It seems around 6500k-7k the Elasticsearch inserts start to lag behind how fast we're pulling the data from Kafka. Our initial thoughts were that the "data adjusting" stage of our app was causing the latency but we've been able to rule that out by adding some metrics around that part of the app. Everything is fine until we reach the point where we want to do inserts. My question is are there any other users out there pushing ~10k inserts per second (that is our goal) using the Java API? If so would you mind sharing some of the settings you are using? We've tried adjusting the BulkProcessor concurrent count and bulk size but nothing seems to really improve it. One thing I've noticed with our monitoring is that sometimes it seems like our Elasticsearch client gets backed up or something. We'll see inserts chugging along at 6k and then just start dropping and then after a few seconds they start coming back up. No GCs or anything happen during this time so I'm not sure what would be causing that. The health of the boxes while we're running looks fine (both on the ES nodes as well as where our app lives) and inside of the JVM everything seems to be ok as well (no huge GCs or anything). I've searched this list and have found people talking about doing 10k inserts per second so we know it's totally possible, we just can't seem to get the right setup to get to that number. Any suggestions or tips would be greatly appreciated! -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/804767e1-480e-49be-8a79-7fbf4f0ce62e%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
