Re: Bulk throughput issues

tdjb Fri, 03 Jan 2014 07:06:43 -0800

Jörg, I went back to square one with some of the code based on your 
suggestions and we now seem to be inserting into ES at the same rate we are 
pulling from Kafka (which is what we wanted). I am using one transport 
client with ~100 concurrent requests. That alone was note enough though and 
the biggest changes that seemed to have helped us get what we wanted as 
changing the refresh time to 10s and the shards from 5 to 16. Now we are 
doing about 17k inserts per second on each consumer instance.


The only issue we've seen now is that after some time Elasticsearch itself 
becomes a bit unstable. It appears to be related to the merging as the logs 
indicate really long merge times (multiple minutes) right around the time 
we start seeing issues. My guess is that is a topic for another thread :)

Geet, we are basically just using the BulkProcessor object as-is with a 
wrapper around it so all of our worker threads can use the same 
BulkProcessor:

BulkProcessor.builder(client, new BulkProcessor.Listener() {
                @Override
                public void beforeBulk(long executionId, BulkRequest request
) {
                    logger.info("Bulk Going to execute new bulk composed of 
{} actions", request.numberOfActions());
                    getInserts.mark();
                }

                @Override
                public void afterBulk(long executionId, BulkRequest request, 
BulkResponse response) {
                    logger.info("Executed bulk composed of {} actions",request
.numberOfActions());
                    getInserts.mark(response.getItems().length);
                }

                @Override
                public void afterBulk(long executionId, BulkRequest request, 
Throwable failure) {
                    logger.warn("Error executing bulk", failure);
                }
                }).setBulkActions(maxBulkCount).setConcurrentRequests(
bulkThreads).setFlushInterval(TimeValue.timeValueMillis(maxBulkTimeoutMs)).
build();




And then use the add() method to add your documents.

On Wednesday, January 1, 2014 5:27:40 AM UTC-7, Jörg Prante wrote:
>
> There is no need for more than one client instance per JVM. You can 
> increase the bulk request concurrency in the BulkProcessor with 
> "setConcurrentRequests" to avoid blocking threads, until you reach the 
> sweet spot where client submitting resources matches the indexing capacity 
> of the cluster. 
>
> This is a matter of dynamic balance, which is different from setup to 
> setup. The default request concurrency is 1. For a higher value, you have 
> to prepare enough heap resources and maybe run your doc construction in 
> multiple threads to exploit the advantages.
>
> As a rule of thumb, use 4 * available cores for the concurrency, and 
> ~1-10MB for the bulk size.
>
> For example, I often operate with a bulk size of 1000 docs and a 
> concurrency level of 32.
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bf606027-ecee-4250-aded-40b0cacaf3c7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Bulk throughput issues

Reply via email to