Jörg, I went back to square one with some of the code based on your
suggestions and we now seem to be inserting into ES at the same rate we are
pulling from Kafka (which is what we wanted). I am using one transport
client with ~100 concurrent requests. That alone was note enough though and
the biggest changes that seemed to have helped us get what we wanted as
changing the refresh time to 10s and the shards from 5 to 16. Now we are
doing about 17k inserts per second on each consumer instance.
The only issue we've seen now is that after some time Elasticsearch itself
becomes a bit unstable. It appears to be related to the merging as the logs
indicate really long merge times (multiple minutes) right around the time
we start seeing issues. My guess is that is a topic for another thread :)
Geet, we are basically just using the BulkProcessor object as-is with a
wrapper around it so all of our worker threads can use the same
BulkProcessor:
BulkProcessor.builder(client, new BulkProcessor.Listener() {
@Override
public void beforeBulk(long executionId, BulkRequest request
) {
logger.info("Bulk Going to execute new bulk composed of
{} actions", request.numberOfActions());
getInserts.mark();
}
@Override
public void afterBulk(long executionId, BulkRequest request,
BulkResponse response) {
logger.info("Executed bulk composed of {} actions",request
.numberOfActions());
getInserts.mark(response.getItems().length);
}
@Override
public void afterBulk(long executionId, BulkRequest request,
Throwable failure) {
logger.warn("Error executing bulk", failure);
}
}).setBulkActions(maxBulkCount).setConcurrentRequests(
bulkThreads).setFlushInterval(TimeValue.timeValueMillis(maxBulkTimeoutMs)).
build();
And then use the add() method to add your documents.
On Wednesday, January 1, 2014 5:27:40 AM UTC-7, Jörg Prante wrote:
>
> There is no need for more than one client instance per JVM. You can
> increase the bulk request concurrency in the BulkProcessor with
> "setConcurrentRequests" to avoid blocking threads, until you reach the
> sweet spot where client submitting resources matches the indexing capacity
> of the cluster.
>
> This is a matter of dynamic balance, which is different from setup to
> setup. The default request concurrency is 1. For a higher value, you have
> to prepare enough heap resources and maybe run your doc construction in
> multiple threads to exploit the advantages.
>
> As a rule of thumb, use 4 * available cores for the concurrency, and
> ~1-10MB for the bulk size.
>
> For example, I often operate with a bulk size of 1000 docs and a
> concurrency level of 32.
>
> Jörg
>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/bf606027-ecee-4250-aded-40b0cacaf3c7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.